rdbms tables to couchdb database tables - couchdb

We are creating a new web application and we intend to use CouchDB. The old web application is being rewritten and we are migrating from RDBMS to CouchDB. I have a RDBMS schema with 10+ tables and I want to recreate the same in CouchDB. Which is better approach to do this in CouchDB?
Options
Create a new database in CouchDB for every table in my RDBMS schema
Create only one database in CouchDB and store all RDBMS tables into this CouchDB, having an explicit column called doc_type/table_type to represent which table/row type it represents in RDBMS table.
What are the pros and cons of these approaches? What is the recommended approach?

It all depends.
In general, be wary of trying to "translate" a RDBMS schema naively to CouchDB -- it rarely ends up in a happy place. A relational schema -- if designed well -- will be normalized and reliant on multi-tabular joins to retrieve data. In CouchDB, your data model will (probably) not be normalized nearly as much, and instead the document unit representing either a row from a table, or a row returned from a joins in the relational DB.
In CouchDB there are no joins, and no atomic transactions above the document unit. When designing a data model for CouchDB, consider how data is accessed and changed. Things you need to be accessed atomically belong in the same document.
As to the many databases vs a single database with documents with a "type" field, the single database option allows you to easily perform map-reduce queries across your whole data set. This isn't possible if you use multiple databases, as a map-reduce view is strictly per-database. The number of databases should be dictated by the access pattern -- if you have data that is only ever accessed by a subset of your application's queries, and never needs slicing and dicing with other data, that can be housed in a separate database.
I'd also recommend checking out the new partitioned database facility in CouchDB and Cloudant.

Related

Do we need to denormalize model in Cassandra?

We usually store a graph of objects in databases. in rdbms, we need to male joins to retry the relationships between objects. In cassandra, it is promoted to denormalize model to fit the queries. But making this, we make the update of the model more complex or more specified.
In Cassandra, it exists complex data types like set, map, list ou tuples. These types make possible to store the relationships between object in a straitghforward manner (association, aggregation, composition of object) by storing inside for instance a list the ids of the connected objects.
The only drawback is then to have to divide a sql complex join request in several requests.
I ve not seen papers on cassandra dealing with this kind of solution. Has someone in mind the reason why this solution is not promoted?
Cassandra is highly write optimized database. So writes are cheap, meaning an extra three or four writes will hardly matter considering the difficulties it would create if it were not otherwise.
Regarding graphs of objects, the answer is: No. Cassandra isn't meant to store graphs of objects. Cassandra is meant to store data for queries. The RDBMS equivalent would be views in PostgreSQL. Data has to be stored in a way that a query can be easily serviced. The main reason being that reads are slow. The goal of data modeling in Cassandra is to make sure a read is almost always from a single partition.
If it were normalized data, a query would need to hit a minimum of two partitions and worst case scenarios would create latencies that would render the application unusable for any practical purpose.
Hence data modeling in Cassandra is always centered on queries and not the relationship between objects.
More on these basic rules can be found in Datastax's documentation
http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling

How to arrange my Data in NoSQL (Invoices)

i'm walking my first steps with nosql databases, but so far my knowledge is very basic. I try to set up a database for a small invoice system.
In SQL i'd create 4 Tables: Products, Customers , Invoices, and a match table for Invoice and the produts.
But how to do this with nosql? Do i even build relations or just build 1 document for each invoice.
You should keep in mind that NoSQL design is not only based on data structure but also strongly on data function. So you should first ask yourself what kind of queries you need to do over your data and take it from there.
First figure out how far you want to go with denormalization and aggregation. For instance: what sets of data will often require to query or update at once? And try to keep that to a single document even if it means duplicating data from other entities (i.e. Storing customer data along with the invoice data).
So ask yourself why you want to use non relational databases, and how will you use that data. Then decide which modeling techniques to apply and how far. The highly scalable blog has a great article about NoSQL data modeling if you care to give it a read.
... or just build 1 document for each invoice.
Yes, do that for the beginning. Imagine your data in the CouchDB as read-only copy of your data in the relational database. The docs are like the result of your SQL queries.
Do i even build relations?
Of course you can, its the same as in your SQL tables. You including ids of foreign docs and name the property regarding to the relation you want to express e.g. doc.customer_id in an invoice doc can point to the doc._id of a customer doc.
Its helpful you imagine the CouchDB views as "relations" e.g. you can create a view called InvoicesByCustomer with the example above.
But summarized i would recommend to begin with the 1 document for each invoice.-approach and follow #JavoSN hint ...
So you should first ask yourself what kind of queries you need to do over your data and take it from there
... when you know that clearly its time to dig deeper into your possibilities of document designs.

Is Cassandra a Key value store or wide column store?

I'm preparing a course on NoSQL for database novices. Did a lot of research online and now, I'm in a dilemma as to categorize Cassandra as a Wide Column Store or a Key Value Store? Or shall I call it a two dimensional Key Value Store? I'm having the same issue with CouchBase. Is it a Key Value store or a Document Store?
I'm looking for a Solid way to categorize NoSQL Databases in their versions in 2015. Any help is appreciated.
Since there is a Couchbase answer I'll jump-in on the Cassandra side. From the Cassandra GitHub page:
Cassandra is a partitioned row store. Rows are organized into tables
with a required primary key.
Partitioning means that Cassandra can distribute your data across
multiple machines in an application-transparent matter. Cassandra will
automatically repartition as machines are added and removed from the
cluster.
Row store means that like relational databases, Cassandra organizes
data by rows and columns.
I can't make an informed comment on Cassandra (although my gut instinct is Wide Column over K/V), but for Couchbase I'd probably say there's a stronger argument for categorising it as a document store, given the map/reduce functionality (through views), and the upcoming N1QL query language. There is a compelling argument for it being a K/V store, also, but I'd say for the purposes of communicating differences in competing NoSQL solutions in an educational course, categorising it as a document store wouldn't be unreasonable.
Couchbase can also act as a distributed cache, however, which may be something you wish to touch on in your course.

Using both graph db and document db

I'm considering a setup where I have entities stored both in a document db (e.g. CouchDB) and a graph db (e.g. Neo4j).
The rationale is storing each entity information (data, blobs, values, complex internal structure) in the document db while storing the entity relations (parents, children, associated entities) in the graph db.
Has anyone done / seen / been bitten by a setup like this? What kind of issues should I expect? First thing that come to mindaka the 2-phase commit. But backups are problematic too here.
You may check out the book "Seven DBs in Seven Weeks". 8th chapter talks about building up a polyglot structure via CouchDB, Neo4j and Redis.
Ran,
Since CouchDB and most (all?) of the document/ kv stores do not support transactions, you would need to stop worrying about 2-phase-commits. You can do XA transactions between Neo4j and MySQL for example, but not CouchDB or it's relatives.
Indeed, for simplicity's sake, why not a pure graph database architecture? You get the better expressiveness and transactions - what is the rationale of adding another moving part in the form of a second store type?

Why Document DB ( like mongodb and couchdb ) are better for large amount of data?

I am very newbie to this world of document db.
So... why this db are better than RDBMS ( like mysql or postgresql ) for very large amount of data ?
She have implement good indexing to carry this types of file, and this is designed for. This solution is better for Document Database, because is for it. Normal database is not designed to saving "documents", in this option you must hard work to search over your documents data, because each can be in other format this is a lot of work. If you choice document db solution you have all-in-one implemented because this database is for only "docuemnts", because this have implementation of these needed for it functions.
You want to distribute your data over multiple machines when you have a lot of data. That means that joins become really slow because joining between data on different machines means a lot of data communication between those machines.
You can store data in a mongodb/couchdb document in a hierarchical way so there is less need for joins.
But is is dependent on you use case(s). I think that relational databases do a better job when it comes to reporting.
MongoDB and CouchDB don't support transactions. Do you or your customers need transactions?
What do you want to do? Analyzing a lot of data (business intelligence/reporting) or a lot of small modifications per second "HVSP (High Volume Simple Processing)"?

Resources