Can CouchDB do this? - couchdb

I evaluating CouchDB & I'm wondering whether it's possible to achieve the following functionality.
I'm planning to develop a web application and the app should allow a 'parent' table and derivatives of this table. The parent table will contains all the fields (master table) and the user will selectively choose fields, which should be saved as separate tables.
My queries are as follows:
Is it possible to save different versions of the same table using CouchDB?
Is there an alternative to creating child tables (and clutter the database)?
I'm new to NoSQL databases and am evaluating CouchDB because it supports JSON out of the box and this format seems to fit the application very well.
If there are alternatives to NOT save the derivatives as separate tables, the better will the application be. Any ideas how I could achieve this?
Thanks in advance.

CouchDB is a document oriented database which means you cannot talk in terms of tables. There are only documents. The _rev (or revision ID) describes a version of a document.
In CouchDB, there are 2 ways to achieve relationships.
Use separate documents
Use an embedded array
If you do not prefer to clutter your database, you can choose to use option (2) by using an embedded array.
This gives you the ability to have cascade delete functionality as well for free.

Related

Filter a subset of fields when replicating in couchdb

I have some documents I would like to replicate to per user databases but only with a subset of fields on the main document. The "selector" option works great to easily select the docs I need but it does not appear to support the "fields" option. I can't find a way to do this without manually filtering the document fields on my node server, which seems quite inefficient.
It is not possible to replicate part of a document. The CouchDB replication mechanism is replicating the whole document as a unit.
I suggest you to review your document design in order to have a better fit with the replication requirements. Maybe it is more accurate to have two separate documents in your case.

How to create a view that gets information from 2 other databases?

I have 2 lotus notes databases which have basically the same information: employee data. As there are too many documents, our team thought it would be better to have all data splitted in two DBs. Also, data of both databases use a form with the same name and design, called frmEmployeeInfo.
The client wants a third database with a view that will contain data (documents) from both databases I mentioned before. I know I can use, for example, outlines to open a view of another database but...is it possible to create a view in this third database that shows documents from other 2 DBs? I'm not sure if this is 'doable'. I don't want to copy documents from the databases into this 3rd. database because I think the database will be very slow, as there will be a lot of documents.
Do you have any kind of suggestion about how can I do it?
Thank you in advance.
You can't do that in classic Notes. But you should be able to do it in XPages (or through a web interface you create).
How big is the database? I have Notes databases with millions of documents, I don't see a need to split them into two, that sounds like a terrible design if you want to access all documents easily.

PouchDB structure

i am new with nosql concept, so when i start to learn PouchDB, i found this conversion chart. My confusion is, how PouchDB handle if lets say i have multiple table, does it mean that i need to create multiple databases? Because from my understanding in pouchdb a database can store a lot of documents, but a document mean a row in sql or am i misunderstood?
The answer to this question seems to be surprisingly under-documented. While #llabball clearly gave a decent answer, I don't think that views are always the way to go.
As you can read here in the section When not to use map/reduce, Nolan explains that for simpler applications, the key is to abuse _ids, and leverage the power of allDocs().
In other words, if you had two separate types (say artists, and albums), then you could prefix the id of each type to obtain an easily searchable data set. For example _id: 'artist_name' & _id: 'album_title', would allow you to easily retrieve artists in name order.
Laying out the data this way will result in better performance due to not requiring extra indexes, and less code. Clearly however, if your data requirements are more complex, then views are the way to go.
... does it mean that i need to create multiple databases?
No.
... a document mean a row in sql or am i misunderstood?
That's right. The SQL table defines column header (name and type) - that are the JSON property names of the doc.
So, all docs (rows) with the same properties (a so called "schema") are the equivalent of your SQL table. You can have as much different schemata in one database as you want (visit json-schema.org for some inspiration).
How to request them separately? Create CouchDB views! You can get all/some "rows" of your tabular data (docs with the same schema) with one request as you know it from SQL.
To write such views easily the property type is very common for CouchDB docs. Your known name from a SQL table can be your type like doc.type: "animal"
Your view names will be maybe animalByName or animalByWeight. Depends on your needs.
Sometimes multiple-databases plan is a good option, like a database per user or even a database per user-feature. Take a look at this conversation on CouchDB mailing list.

How to arrange my Data in NoSQL (Invoices)

i'm walking my first steps with nosql databases, but so far my knowledge is very basic. I try to set up a database for a small invoice system.
In SQL i'd create 4 Tables: Products, Customers , Invoices, and a match table for Invoice and the produts.
But how to do this with nosql? Do i even build relations or just build 1 document for each invoice.
You should keep in mind that NoSQL design is not only based on data structure but also strongly on data function. So you should first ask yourself what kind of queries you need to do over your data and take it from there.
First figure out how far you want to go with denormalization and aggregation. For instance: what sets of data will often require to query or update at once? And try to keep that to a single document even if it means duplicating data from other entities (i.e. Storing customer data along with the invoice data).
So ask yourself why you want to use non relational databases, and how will you use that data. Then decide which modeling techniques to apply and how far. The highly scalable blog has a great article about NoSQL data modeling if you care to give it a read.
... or just build 1 document for each invoice.
Yes, do that for the beginning. Imagine your data in the CouchDB as read-only copy of your data in the relational database. The docs are like the result of your SQL queries.
Do i even build relations?
Of course you can, its the same as in your SQL tables. You including ids of foreign docs and name the property regarding to the relation you want to express e.g. doc.customer_id in an invoice doc can point to the doc._id of a customer doc.
Its helpful you imagine the CouchDB views as "relations" e.g. you can create a view called InvoicesByCustomer with the example above.
But summarized i would recommend to begin with the 1 document for each invoice.-approach and follow #JavoSN hint ...
So you should first ask yourself what kind of queries you need to do over your data and take it from there
... when you know that clearly its time to dig deeper into your possibilities of document designs.

What's the best approach for using SOLR with web projects?

ok, I'm totally new to SOLR and Lucene, but have got Solr running out-of-the-box under Tomcat 6.x and have just gone over some of the basic Wiki entries.
I have a few questions, and require some suggestions too.
Solr can index data in files (XML, CSV) and it can also index DBs. Can you also just point it to a URI/domain, and have it index a website in the way google would?
If I have a website with "Pages" data, so "Page Name", "Page Content" etc, and "Products Data", so "Product Name", "SKU" etc, do I need two different Schema.xml files? and if so, does that mean two different instances of Solr?
Finally, if you have a project with a large relational and normalized database, what would you say is the best approach from the 3 options below?:
Have a middleware service running in the background, which mines the DB and manually creates the relevant XML files to then send to SOLR
Have SOLR index the DB directly. In this case, would it be best to just point SOLR to views, which would abstract all the table relationships?
Any other options I'm unaware of?
Context: We're running in a Windows 2003 environment, .NET 3.5, SQLServer 2005/2008
cheers!
No, you need a crawler for that, e.g. Nutch
Yes, you want two separate indexes ( = two schema.xml) since the datasets don't seem to be related. This doesn't mean two instances of Solr, you can manage the two indexes with Cores.
As for populating the Solr index, it depends on your particular project, for example, can it tolerate stale data or does it have to absolutely fresh.
Other options to index data include:
Database triggers
If you're using some sort of ORM use its interception capabilities. For example you can use NHibernate events to update the index on update, insert or delete. If you use NHibernate and SolrNet this is taken care of automatically
I think Mauricio is dead on for his advice. The only point I would make is that when deciding to have a "middleware" indexer, or use the database directly. If your database (or the views?) map very closely to what a good Solr schema wants, then DIH is great. But, if you are indexing from multiple sources of data, or if you have to munge about the data in your database to meet what Solr would like, then having a dedicated middleware indexer is better.

Resources