I'm interested in people's thoughts comparing storing data in a traditional SQL based Database or utilising a Memory-Mapped File such as the one in the new .Net 4.0 runtime. The data in question would be arrays of simple structures.
Obvious pros and cons:
SQL Database Pros
Adhoc query support
SQL Management Tools
Schema changes (adding more columns and setting default values)
Memory-Mapped Pros
Lighter overhead? (this is an assumption on my part)
Shareable between process threads
Any others?
Is it worth it for performance gains?
You could try out MongoDB, and get a mixture of both worlds (database-like features over a memory mapped store).
MongoDB bridges the gap between
key-value stores (which are fast and
highly scalable) and traditional RDBMS
systems (which provide rich queries
and deep functionality).
Here's a good article that can walk you through installing and coding to MongoDB:
Going NoSQL with MongoDB
SQLServer can use memory mapped files if you choose "SharedMemory" as the protocol. Otherwise it'll use Pipes, TCP or VIA.
Regarding pros and cons.. to me they are amost not comparable. SQL has the whole query/multiuser/transaction etc infrastructure built in. If you store with MMF's you are on your own regarding all that. On the other hand, MMF are built in the OS.. no seed for a server/service.
Related
Currently, My web application is on Redis db(all database). it's required more than 4 GB RAM which is cost me a lot.
I want to migrate some part of my application into permanent storage DB(SQL, mongo...)
So, Can anyone tell me which is the best choice(SQL, mongo...)?
Technology stack of my application:
nodejs(express)
angularjs
redis
It really depend on your design. Is your data highly relational? Redis is considered a NoSQL technology so I guess MongoDb would be somewhat similar but implementation will be file-based instead of key-value set. If you need your data to have strong relationship between each data set then SQL family is designed for exactly that, but a lot of work is needed to build the tables first and then separate the data.
In terms of
scalability,
performance,
maintenance,
Ease of use / Learning curve
cost,
In order of significance but wouldn't mind a general answer as I appreciate I m probably asking for too much :)
Thanks
EDIT: I m looking for a database that will serve as the single authoritative data store and I need all attributes of the documents stored to be indexed for various business reasons. Therefore I know that other solutions won't do what I m looking for.
tl;dr; If you are using JavaScript and building browser apps, node.js and DocumentDB are a match made in heaven. If you are using .NET and/or other Azure services, then DocumentDB is favored. If you are using other AWS services, then SimpleDB might be better.
I know that questions like this are not ideal for Stack Overflow, but I often see value in answers like this and my most popular answer on SO is essentially informed opinion backed by evidence. I have not used SimpleDB but I looked into it before deciding on DocumentDB. I rejected it pretty quickly... although I did give AWS Lambda a serious look before deciding on DocumentDB. So:
scalability. DocumentDB has a very straight forward and explicit scaling model -- add more collections if you need either more space or more operations per second. SimpleDB's scaling model is similar except less straight forward since you add domains which are overloaded to both provide type separation (think tables) and scalability. You can scale either to whatever you need.
performance. Since I never built anything on it, I can't say anything about SimpleDB's performance. However, I've been very impressed with the performance of DocumentDB. Latency is less than 10ms for simple id-based reads and I get impressive latency and throughput for queries. The DocumentDB implementation of our current app returns complex n-dimensional aggregations (done in stored procedures on DocumentDB using documentdb-lumenize) in 1/4 the time of the functionally-equivalent MongoDB/node.js implementation. You'd have to do your own performance testing on your actuall application to have a definitive answer here.
maintenance. Both are much more hands off than traditional data stores. There just aren't that many knobs to turn maintaining either of them. SimpleDB geographically distributes your data by default. You'd have to do the equivalent manually in DocumentDB. Possible, but harder. DocumentDB has good import/export tools and their backup solution is about to be significantly upgraded.
ease of use / learning curve. If you are JavaScript programmer, than DocumentDB has a lot to recommend it. DocumentDB uses JSON natively. SimpleDB uses XML. DocumentDB has ACID-enabling stored procedures written in JavaScript. You'd need to combine SimpleDB with something else (Lambda maybe, but the XML/JavaScript mismatch would make this less than ideal) to get the equivalent. Both allow use to use SQL but DocumentDB also allows for JavaScript native queries.
There is one huge mindset hurdle that you will have to get over in order to be successful with DocumentDB. Despite the fact that they both scale by adding more domains/collections, SimpleDB domains are closer conceptually to tables. The word choice of "collection" by the DocumentDB team is unfortunate since they are more akin to partitions and should not be thought of as tables. The hard part is getting used to the idea that you store all of your different data types in the same collection. Once you get over that, I find DocumentDB's approach refreshing and incredibly flexible. I can efficiently model inheritance and type-mixins. Collections nay partitions have one purpose -- scalability. Domains are used for both scalability and data type separation which is actually harder in practice.
cost. Not much to say here. Both allow you to scale your cost gradually. For really small implementations, DocumentDB is probably more expensive since the smallest unit of usage is a single collection which is $25/month minimum. You'd have to do your own modeling/what-if analysis to determine which would be less expensive for you. Note, that Azure is being every aggressive in general and even pushing AWS to lower prices in some cases. My gut is that they would be roughly equal in cost for most applications.
Other thoughts:
You wrote, "I need all attributes of the documents stored to be indexed". One really nice feature of DocumentDB is that you can specify the size of your indexes By default, every field is indexed into a 3-byte per field hash index, which is highly space efficient. I do not know if SimpleDB has the equivalent.
This is a bit like comparing apples to oranges. I consider DocumentDB to be like MongoDB or CouchDB in it's data model and VoltDB in its use execution model (although VoltBD sprocs are written in Java). SimpleDB feels more like a simple XML object store. If you already have a big XML mindset, then it might be easier, but I think there are more folks using JSON today than XML.
Writing ACID-enabling stored procedures in JavaScript is a killer feature that only DocumentDB has. Some say the days of stored procedures are over; that you should put all such logic in your application server layer. If you implementing a simple CRUD API, that may be, but almost every application requires some sort of transaction where more than one row is changed at a time. This is mind bogglingly hard to do correctly without transaction support in your data store. Even if you do implement the equivalent of transactions with your NoSQL database, the overhead of the implementation eats away any development/performance/scalability advantages that you got by choosing NoSQL rather than SQL.
DocumentDB's user defined functions and triggers (also written in JavaScript) might be useful, although I believe the trigger implementation is crippled at this moment in time and I haven't found a use for UDFs myself yet.
DocumentDB has built-in attachment support. You need to integrate manually with S3 for the equivalent on AWS.
DocumentDB has geo indexing and operators.
SimpleDB's 1K per document limit is a serious limitation. This tells me that it's designed mostly for logging or as an index to S3 and not a full-fledged document store. The limit for DocumentDB is 512K.
If comparison to SimpleDB is like apples to oranges, then comparison to ElasticSearch is like apples to fire engines. My impression of ElasticSearch is that it's all about full-text searching and analytics. I don't think it's space/execution/api efficient to serve as a primary transactional store. Built on Lucene, it was not designed to have the reliability/durability to be your primary store. Further, even when hosted, it's more of an IaaS offering, wherease DocumentDB and SimpleDB are true PaaS offerings. The maintenance will be much higher with ElasticSearch.
RavenDB (a .Net JSON storage storage db with querying) provides aggressive caching / memory management under its own control (via its own storage engine Munin), with config parameters to tweak various cache sizes etc... Google groups suggests that before (may not be the case with latest releases) occasional out-of-memory exceptions as result of un tuned parameters (with sufficient size db / index).
CouchDB seems to take a different approach and leaves the caching to the operating system. Meaning when I GET /db1/doc-id-1 it essential in terms of programming a file read op against the filesystem which the OS can optimize away due to its own caches. Similarly I believe this is same for views and of reduce results (multiple parts of b tree need loading/computed from disk depending on range).
The latter seems superior to me, the OS have gone from years of evolutions in caching/paging etc.. and pressure from other services can balance memory.
Firstly.
Am I correct in my understanding?
Is CouchDB's approach unique to Unix based OSes (although I see they have a Windows port)?
Is there a reason a .Net DB cant relying on the OS to optimize away file reads etc..?
What are the disadvantages and advantages of each approach that would influence choice in building a data store?
Side note: I believe Redis is the same just keeping the index in memory, each GET KEY is a disk hit (which either does hit the disk heads or not depending on the OS file caching)
Jia93,
One of the reasons that we are working the way we do is that we have stronger separation between the layer. CouchDB have much the same optimizations as we do (keeping things in mem), but it is doing that on top of the BTree structure that is directly expose to the application.
Another reason for caching the results is to avoid the costs of parsing the json on every single request.
I'm getting more into Node.js and am enjoying it. I'm moving more into web application development.
I have wrapped my head around Node.js and currently using Backbone for the front end. I'm making a few applications that uses Backbone to communicate with the server using a RESTful API. In Node.js, I will be using the Express framework.
I'm reaching a point where I need a simple database on the server. I'm used to PostgreSQL and MySQL with Django, but what I'm needing here is some simple data storage etc. I know about CouchDB, MongoDB and Redis, but I'm just not sure which one to use?
Is any one of them better suited for Node.js? Is any one of them better for beginners, moving from relational databases? I'm just needing some guidance on which to choose, I've come this far, but when it's coming to these sort of databases, I'm just not sure...
Is any one of them better suited for
Node JS?
Better suited especially for node.js probably no, but each of them is better suited for certain scenarios based on your application needs or use cases.
Redis is an advanced key-value store and probably the fastest one among the three NoSQL solutions. Besides basic key data manipulation it supports rich data structures such as lists, sets, hashes or pub/sub functionality which can be really handy, namely in statistics or other real-time madness. It however lacks some sort of querying language.
CouchDB is document oriented store which is very durable, offers MVCC, REST interface, great replication system and map-reduce querying. It can be used for wide area of scenarios and substitute your RDBMS, however if you are used to ad hoc SQL queries then you may have certain problems with it's map-reduce views.
MongoDB is also document oriented store like CouchDB and it supports ad hoc querying besides map-reduce which is probably one of the crucial features why people searching for DRBMS substitution choose MongoDB over the other NoSQL solutions.
Is any one of them better for
beginners, moving from relational
databases?
Since you are coming from the RDBMS world and you are probably used to SQL then, I think, you should go with the Mongodb because, unlike Redis or CouchDB, it supports ad hoc queries and the querying mechanism is similar to SQL. However there may be areas, depending on your application scenarios, where Redis or CouchDB may be better suited to do the job.
I'm looking at both projects and I can't really see the difference
from Cassandra Site:
Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store...Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.
from CouchDB Site:
Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API.
That said, I see the specific differences between each project as: access methods, written languages, etc. but to put AN EXAMPLE, when you talk about SOLR or Sphinx you know both are indexers with big differences but at the end are indexers.
Can I say here that Cassandra and CouchDB are non-relational databases that in some cases one can replace the other?
CouchDB is a document store. You put documents (JSON objects) in it and define views (indexes) over them. The objects can be arbitrarily complex with potentially deep structure. Further, they are not constrained to following some consistent schema.
Cassandra is a ragged-table key-value store. It just stores rows, each of which has a set of named columns grouped in to families with values. It sounds quite close to BigTable; BigTable doesn't require each row to have the same structure (unlike an SQL database). The values may have some structure, but this kind of store doesn't know anything about that -- they're just strings/byte sequences.
Yes, they are both non-relational databases, and there is probably a fair amount of overlap in their applicability, but they do have distinctly different data organization models. Each can probably be forced into emulating the other, but each model will map best to a different set of problems.
CouchDB has a feature present in very few open source database technologies: offline replication. CouchDB is designed so that applications can be run at the edge of the network. These applications are available even when internet connectivity fails.
Offline replication can also be leveraged to build large clusters, but CouchDB is designed to be robust and simple whether it is running on a single server, a datacenter, or even a smartphone.