I'm considering a setup where I have entities stored both in a document db (e.g. CouchDB) and a graph db (e.g. Neo4j).
The rationale is storing each entity information (data, blobs, values, complex internal structure) in the document db while storing the entity relations (parents, children, associated entities) in the graph db.
Has anyone done / seen / been bitten by a setup like this? What kind of issues should I expect? First thing that come to mindaka the 2-phase commit. But backups are problematic too here.
You may check out the book "Seven DBs in Seven Weeks". 8th chapter talks about building up a polyglot structure via CouchDB, Neo4j and Redis.
Ran,
Since CouchDB and most (all?) of the document/ kv stores do not support transactions, you would need to stop worrying about 2-phase-commits. You can do XA transactions between Neo4j and MySQL for example, but not CouchDB or it's relatives.
Indeed, for simplicity's sake, why not a pure graph database architecture? You get the better expressiveness and transactions - what is the rationale of adding another moving part in the form of a second store type?
Related
We are creating a new web application and we intend to use CouchDB. The old web application is being rewritten and we are migrating from RDBMS to CouchDB. I have a RDBMS schema with 10+ tables and I want to recreate the same in CouchDB. Which is better approach to do this in CouchDB?
Options
Create a new database in CouchDB for every table in my RDBMS schema
Create only one database in CouchDB and store all RDBMS tables into this CouchDB, having an explicit column called doc_type/table_type to represent which table/row type it represents in RDBMS table.
What are the pros and cons of these approaches? What is the recommended approach?
It all depends.
In general, be wary of trying to "translate" a RDBMS schema naively to CouchDB -- it rarely ends up in a happy place. A relational schema -- if designed well -- will be normalized and reliant on multi-tabular joins to retrieve data. In CouchDB, your data model will (probably) not be normalized nearly as much, and instead the document unit representing either a row from a table, or a row returned from a joins in the relational DB.
In CouchDB there are no joins, and no atomic transactions above the document unit. When designing a data model for CouchDB, consider how data is accessed and changed. Things you need to be accessed atomically belong in the same document.
As to the many databases vs a single database with documents with a "type" field, the single database option allows you to easily perform map-reduce queries across your whole data set. This isn't possible if you use multiple databases, as a map-reduce view is strictly per-database. The number of databases should be dictated by the access pattern -- if you have data that is only ever accessed by a subset of your application's queries, and never needs slicing and dicing with other data, that can be housed in a separate database.
I'd also recommend checking out the new partitioned database facility in CouchDB and Cloudant.
I’m just starting out with CouchDB (2.1), and I’m planning to use it to replicate confidential per-user data from a mobile app up to my server. I’ve read that per-user databases are the best way to do this, and I’ve set that up. Each database has a mix of user-created documents of types Foo and Bar.
Now, I’d also like to be able to collect multi-user slices of that data together into one database and build views on it for admin reporting. Say I want a database which contains all the Foos from all users. So far so good, an entry in _replicator with a filter from each user database to one target does the job.
But looking at the combined database, I can’t tell which user a given Foo came from. I could write the user id into each document within the per-user database but that seems redundant and adds the complexity of validation. Is there any other way?
CouchDB's replicator simply tries to match up the exact state of a given document in the target database — and if it can't, it stores ± the exact source contents anyway (as a conflicting version).
Furthermore the _rev field of a document, which the replication system uses to check if a document needs to be updated, is actually based on (a hash over) the other document fields.
So unfortunately you can't add metadata during replication. This would indeed be handy for this and other per-user vs. shared replication situations, but it's not something CouchDB currently supports, and it would break some optimizations to add support for it.
I could write the user id into each document within the per-user database but that seems redundant and adds the complexity of validation. Is there any other way?
Including something like a .user field in each document is the right solution.
As far as being redundant, I wouldn't think of it that way — or at least, not as a bad thing. You'll find with CouchDB (and like other NoSQL stores) there's a trend to "denormalize" data to begin with. Especially given the things replication lets me do operationally and architecturally, I'd much rather have a self-contained document than one that relies on metadata derived from a database name.
I'm not sure exactly how in your case an extra field will make validation more complex, so I can't fully speak to that. You do want to make sure the user writing the document has set it "honestly", and so yes there is a bit more complication, but usually not too burdensome in most cases.
I want to build a simple ERP app for my company,i came from PHP MYSQL background, i left PHP and now moving to NODE.JS because it overmatches PHP in many ways, i thought i have to learn NODE.JS,
my company is a meat and poultry importer and has hundreds of product and since it is CATCH WEIGHT INVENTORY SYSTEM, each carton has to be recorded as an individual data, they can be millions of cartons, besides INVENTORY, my app must also has other ERP functionalities such as SALES,PURCHASING,HRD,LOGISTIC and many others.
what i already have in mind, i would use :
-Node.js then
-Express / Meteor as backend
-Angular/React as template engine + D3js (i'm not sure) to visualize various reports
-various DBMS (i think i cannot rely onto just one DBMS, i have to harness the power both of RDBMS and NoSQL DBMS)
MySQL for storing user data
Key/Value pair DBMS (ElasticSearch / Redis) for storing hardly ever change data but used often and needs fast access such as data product
Graph DBMS (Neo4j) for storing user relation, warehouse structure and product movement
Document DBMS (MongoDB) for storing tally weight,product detail and many other document based data.
well chose the right tools and make a good plan is halfway to achieve goal, so i need enlightment from all of you Sir and Mam, give me suggestion what should/shouldn't i choose.
Your stack is okay but you need to re-consider the amount of database in use, you don't need the graph database Mysql can handle that for you and as for your key/value pair DBMS go with MongoDb it is a good key/value pair and document store.No need for the extra database.
i'm walking my first steps with nosql databases, but so far my knowledge is very basic. I try to set up a database for a small invoice system.
In SQL i'd create 4 Tables: Products, Customers , Invoices, and a match table for Invoice and the produts.
But how to do this with nosql? Do i even build relations or just build 1 document for each invoice.
You should keep in mind that NoSQL design is not only based on data structure but also strongly on data function. So you should first ask yourself what kind of queries you need to do over your data and take it from there.
First figure out how far you want to go with denormalization and aggregation. For instance: what sets of data will often require to query or update at once? And try to keep that to a single document even if it means duplicating data from other entities (i.e. Storing customer data along with the invoice data).
So ask yourself why you want to use non relational databases, and how will you use that data. Then decide which modeling techniques to apply and how far. The highly scalable blog has a great article about NoSQL data modeling if you care to give it a read.
... or just build 1 document for each invoice.
Yes, do that for the beginning. Imagine your data in the CouchDB as read-only copy of your data in the relational database. The docs are like the result of your SQL queries.
Do i even build relations?
Of course you can, its the same as in your SQL tables. You including ids of foreign docs and name the property regarding to the relation you want to express e.g. doc.customer_id in an invoice doc can point to the doc._id of a customer doc.
Its helpful you imagine the CouchDB views as "relations" e.g. you can create a view called InvoicesByCustomer with the example above.
But summarized i would recommend to begin with the 1 document for each invoice.-approach and follow #JavoSN hint ...
So you should first ask yourself what kind of queries you need to do over your data and take it from there
... when you know that clearly its time to dig deeper into your possibilities of document designs.
As I was reading up about couchdb I stumbled upon a question about transactions and couchdb. Apparently the way to handle transactions in couch is to pull the latest version and compare it to the version you are currently working with. This can present problems if data is changing quickly. The other way is a map reduce and to separate out the transactional data into multiple documents. This also seems less than optimal.
I was thinking about using redis for this sort of data. The increment and decrement functions seem fairly amazing for this sort of purpose.
So I could just write some sort of string for a transactional key like:
//some user document
{
name: "guy",
id: 10,
page_views: "redis user:page_views:10"
}
Then if I read something like "redis" inside of some piece of transactional data then I know to go get that information from redis. I suppose I could decide these things before hand, but since a document oriented database's primary mission is to be flexible and not bound data to columns I figured that there might be an easier way?
Is there an easy way to link redis data to couchdb? should I be doing this all manually and for the few fields that come up? Any other thoughts? Would it be better to update this transactional data "eventually" in the user document or simply not store it there?
Both Redis and CouchDB are "easy" (that is, simple). So in that regard, what you are describing is easy. Of course, by using two databases, you have increased the complexity of your application. But on the other hand, the CouchDB+Redis combination is gaining popularity.
The only tool I know that integrates the two is Mikeal Rogers's redcouch. It is a simple tool. Perhaps you could extend it to add what you need (and send a pull request!).
A more broad consideration is that Redis does not have the full replication feature set that CouchDB does. So Redis might restrict your future options with CouchDB. Specifically, Redis does not support multi-master replication. In contrast with CouchDB, you will always have a centralized Redis database. (Correct me if I'm wrong—I am stronger with CouchDB than with Redis.)