What is the difference between Cassandra and CouchDB? - couchdb

I'm looking at both projects and I can't really see the difference
from Cassandra Site:
Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store...Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.
from CouchDB Site:
Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API.
That said, I see the specific differences between each project as: access methods, written languages, etc. but to put AN EXAMPLE, when you talk about SOLR or Sphinx you know both are indexers with big differences but at the end are indexers.
Can I say here that Cassandra and CouchDB are non-relational databases that in some cases one can replace the other?

CouchDB is a document store. You put documents (JSON objects) in it and define views (indexes) over them. The objects can be arbitrarily complex with potentially deep structure. Further, they are not constrained to following some consistent schema.
Cassandra is a ragged-table key-value store. It just stores rows, each of which has a set of named columns grouped in to families with values. It sounds quite close to BigTable; BigTable doesn't require each row to have the same structure (unlike an SQL database). The values may have some structure, but this kind of store doesn't know anything about that -- they're just strings/byte sequences.
Yes, they are both non-relational databases, and there is probably a fair amount of overlap in their applicability, but they do have distinctly different data organization models. Each can probably be forced into emulating the other, but each model will map best to a different set of problems.

CouchDB has a feature present in very few open source database technologies: offline replication. CouchDB is designed so that applications can be run at the edge of the network. These applications are available even when internet connectivity fails.
Offline replication can also be leveraged to build large clusters, but CouchDB is designed to be robust and simple whether it is running on a single server, a datacenter, or even a smartphone.

Related

web real time analytics dashboard: which technologies should use? (node/django, cassandra/mongodb...)

we want to develop a dashboard to analyze geospatial data.
This is a small and close approach to what we want to do: http://adilmoujahid.com/images/data-viz-talkingdata.gif
Our main concerns are about the backend technologies to be used. (front will be D3.js, DC.js, leaflet.js...)
Between Django and node.js, we think that we will use node.js, cause we've read than its faster than Django for this kind of tasks. But we are not sure and we are open to ideas.
But about Mongo or Cassandra, we are so confused. Our data is mostly structured, so store it in tables like Cassandra would make it easy to manage, also Cassandra seems to have better performance. However, we also have IoT devices data, with lots of real-time GPS location...
Which suggestions can you give to us to achieve our goal?
TL;DR Summary;
Dashboard with hundreds of simultaneous users.
Stored data will be mostly structured text/numbers, but will include also images, GPS-arrays, IoT sensors, geographical data (vector-polygons & rasters)
Databases will receive high write load coming from sensors.
Dashboard performance is so important. Its more important to read data in real time, than keeping it uncorrupted/secure.
Most calculus/math will be calculated in the client's browser, the server will try to avoid mathematical operations.
Disclaimer: I'm a DataStax employee so I'll comment on the Cassandra piece.
Cassandra is a good choice for this if your dashboard can be planned around a set of known queries. If those users will be doing ad-hoc queries directly to the database from the dashboard, you'll want something with a little more flexibility like ElasticSearch or (shameless plug) DataStax Search. Especially if you expect the queries/database to handle some of the geospatial logic.
JaguarDB has very strong support of geospatial data (2D and 3D). It allows you to store multi-measurements per point location while other databases support only one measurement (pointm). Many complex queries such as Voronoi polygon, convexhull are also supported. It is open source, distributed and sharded, multiple columns indexes, etc.
Concerning Postgresql and Cassandra, is there much difference in RAM/CPU/DISK usage between them?
Our use case does not require transactions, it will be in a single node and we will have IoT devices writing data up to 500 times per second. However ive read that Geographical data that works better with Potstgis than cassandra...
According to this use case, do you recommend Cassandra or Postgis?

Nosql does not support to ACID property then why some companies using NoSql databases

Nosql does not support to ACID property then why some companies using NoSql databases Like Facebook uses Cassandra and Amazon uses dynamodb
Companies use Cassandra when they need their application to scale to use many servers for higher capacity and higher reliability by keeping redundant copies of data on multiple servers.
In many cases ACID transactions are not needed for building a correctly functioning system.
Databases which have good support for ACID transactions are often limited to a single server, so they do not scale.
Use Cosmos DB and be happy as it supports all most everything you need for data :
Here is the FAQ: https://learn.microsoft.com/en-us/azure/cosmos-db/faq

Which database out of CouchDB, MongoDB and Redis is good for starting out with Node.js?

I'm getting more into Node.js and am enjoying it. I'm moving more into web application development.
I have wrapped my head around Node.js and currently using Backbone for the front end. I'm making a few applications that uses Backbone to communicate with the server using a RESTful API. In Node.js, I will be using the Express framework.
I'm reaching a point where I need a simple database on the server. I'm used to PostgreSQL and MySQL with Django, but what I'm needing here is some simple data storage etc. I know about CouchDB, MongoDB and Redis, but I'm just not sure which one to use?
Is any one of them better suited for Node.js? Is any one of them better for beginners, moving from relational databases? I'm just needing some guidance on which to choose, I've come this far, but when it's coming to these sort of databases, I'm just not sure...
Is any one of them better suited for
Node JS?
Better suited especially for node.js probably no, but each of them is better suited for certain scenarios based on your application needs or use cases.
Redis is an advanced key-value store and probably the fastest one among the three NoSQL solutions. Besides basic key data manipulation it supports rich data structures such as lists, sets, hashes or pub/sub functionality which can be really handy, namely in statistics or other real-time madness. It however lacks some sort of querying language.
CouchDB is document oriented store which is very durable, offers MVCC, REST interface, great replication system and map-reduce querying. It can be used for wide area of scenarios and substitute your RDBMS, however if you are used to ad hoc SQL queries then you may have certain problems with it's map-reduce views.
MongoDB is also document oriented store like CouchDB and it supports ad hoc querying besides map-reduce which is probably one of the crucial features why people searching for DRBMS substitution choose MongoDB over the other NoSQL solutions.
Is any one of them better for
beginners, moving from relational
databases?
Since you are coming from the RDBMS world and you are probably used to SQL then, I think, you should go with the Mongodb because, unlike Redis or CouchDB, it supports ad hoc queries and the querying mechanism is similar to SQL. However there may be areas, depending on your application scenarios, where Redis or CouchDB may be better suited to do the job.

Which is the most suitable Key-Value Store for a RDBMS background person?

Is there a distinct winner among all the key-value stores? Cassandra, MongoDB, CouchDB? and do they all follow some central guidelines? or they all have their own say in defining their APIs.
I'm asking this question, especially from a perspective of a RDBMS skilled person who is new to key-value stores. Which one should we follow to best grasp the understanding/usage of this field?
We know about the RDMS from their theories that all available DBs (Oracle, SQL Server, ..) will have all the artifacts e.g. Tables, Indexes, Foreign keys etc. The only difference in these is the efficiency, security, features.
How can I know about the universal theory of these document-centered Databases and know what are the minimal artifacts that all these DBs (Mongo, Couch etc.) will have?
I work on MongoDB so I'm biased that way, but I think it is a nice combination of the things you're used to with an RDBMS (like dynamic queries and secondary indexes) and the performance and scalability of a key-value store.
Cassandra has a nice distributed model but afaik doesn't support secondary indexes. The document data model support by Mongo and Couch also allows for a little bit more complexity than the tabular model Cassandra uses.
One of the big differences between Mongo and Couch is the way queries are constructed. Couch uses a cool map/reduce mechanism, but your queries must be defined in advance. Mongo uses a more traditional dynamic query model that is more like what you're used to in an RDBMS.

.Net 4.0 Memory-Mapped Files verses RDMS Storage

I'm interested in people's thoughts comparing storing data in a traditional SQL based Database or utilising a Memory-Mapped File such as the one in the new .Net 4.0 runtime. The data in question would be arrays of simple structures.
Obvious pros and cons:
SQL Database Pros
Adhoc query support
SQL Management Tools
Schema changes (adding more columns and setting default values)
Memory-Mapped Pros
Lighter overhead? (this is an assumption on my part)
Shareable between process threads
Any others?
Is it worth it for performance gains?
You could try out MongoDB, and get a mixture of both worlds (database-like features over a memory mapped store).
MongoDB bridges the gap between
key-value stores (which are fast and
highly scalable) and traditional RDBMS
systems (which provide rich queries
and deep functionality).
Here's a good article that can walk you through installing and coding to MongoDB:
Going NoSQL with MongoDB
SQLServer can use memory mapped files if you choose "SharedMemory" as the protocol. Otherwise it'll use Pipes, TCP or VIA.
Regarding pros and cons.. to me they are amost not comparable. SQL has the whole query/multiuser/transaction etc infrastructure built in. If you store with MMF's you are on your own regarding all that. On the other hand, MMF are built in the OS.. no seed for a server/service.

Resources