Cloudant: Indexes vs Views - couchdb

Are Cloudant's concept of Indexes native to CouchDB? It appears the Cloudant has built their Index feature on top of CouchDB, is this correct? If so, what is the difference between an Index and a View?

The Query interface is (currently) a simplifying API for creating and accessing the undelying CouchDB views. The indexes you define via the _index endpoint are actually translated into views, and those views can be accessed and used in the same way as a normal CouchDB view, as well as via the _find endpoint (note: the inverse is not true - Query doesn't currently use existing javascript views). The views stay in the erlang layer so gives us the opportunity for performance enhancements etc.
You can also filter result data to only return document fields you're interested in, as opposed to hard coding the returned fields in the view or running the view result through a list function.
Cheers
Simon

Related

When retrieving documents from a cloudant database via the web interface, what order do they arrive?

I want to retrieve a large collection of documents from Cloudant via the Web interface, with no sort directive, in 200-document pages (using bookmarks). If I retrieve all of the pages, then retrieve them all again, will the documents in the two collections be in the same order?
for views, native UTF-8 sorting is applied to the key defined in the view function.
for search indexes, the default sort order is by relevance with the highest scoring matches first.
without making any changes that would impact the default sort order, subsequent retrievals should be in the same order.

Solr vs Elasticsearch for nested documents

I have been using solr for my project but recently I encountered Elasticsearch which seems to be very promising. My project requires ability to handle nested documents and I would like to know which one does better job. Solr just added child documents recently but is it as good as Elasticsearch's? Could Elasticsearch perform query on both parent and children at once? Thanks
I've been looking into the subject recently and to my understanding ElasticSearch makes the life a lot easier when working with nested documents, although Solr also supports nesting (but is less flexible in querying).
So the features of ElasticSearch are:
"Seamlessly" supports nesting: you don't have to change your
nested documents structure or add specific fields. However, you need
to indicate in the mapping what fields are nested when creating the
index
Supports nested query with "nested" and "path":
Supports aggregation and filtering with nested docs: also via
"nested" and "path".
With Solr you will have to:
Modify your schema.xml by adding the _ root _ field
Modify your dataset so that parent and child documents would have a specific distinguishing field, in particular, childDocuments to indicate children (see more at this question)
Aggregation and filtering on nested documents promises to be very complicated if not impossible.
Also, nested fields are not supported at all.
Recent Solr versions (5.1 and up) can eventually be configured to support nesting (including you'll have to change your input data structure), however, documentation is not very clear and there is not much information on the Internet because these features are recent.
The bottomline is that in the sense of nested documents ElasticSearch can do everything that Solr can and even more with less effort and smoother learning curve. So going with ElasticSearch seems more reasonable in this case.
I am not aware of Elastic Search, so this is always 50% answer.
Solr works best with denormalized data. However, given that you have nested documents, you can use solr in two scenarios:
Query for parent, with a child attribute
Query for all children of a parent.
You can use block join to perform the above queries. Even though, you deal with nested levels, solr internally manages them as denormalized. I mean, when a parent have 2 children, you end up with three high level documents in solr. And solr manages the relation part.

CouchDB - view recursivity

I have a question about querying CouchDB.
I have a query that generates a set of outputs. These outputs are also the result of another query.
I want to define a CouchDB view permitting to get all the outputs (and the inputs of a specific document). Is it possible to get the results of a map function and consider them as un input of another map function ?
In SPARQL, I have do this query, it is modeled as follow :
SELECT ?linkedAction
WHERE { ?action nova:hasOutput ``doc-02-10-C''.
?action (nova:hasInput/^nova:hasOutput)* ?linkedAction.
}
Is it possible to do that in map/reduce ?
Best Regards.
Amin
You can try Couch-Incarnate.
Or use Cloudant chained mapreduce views (hopefully it will be integrated in CouchDB).
No, each view index is completely isolated from other views. (and other databases for that matter) CouchDB's incremental view updates would be impossible to keep efficient when changes from one view can affect another. You'll need to perform this kind of additional processing in your application layer.

CouchDB map/reduce by any document property at runtime?

I come from a SQL world where lookups are done by several object properties (published = TRUE or user_id = X) and there are no joins anywhere (because of the 1:1 cache layer). It seems that a document database would be a good fit for my data.
I am trying to figure-out if there is a way to pass one (or more) object properties to a CouchDB map/reduce function to find matching documents in a database without creating dozens of views for each document type.
Is it possible to pass the desired document property key(s) to match at run-time to CouchDB and have it return the objects that match (or the count of object that match for pagination)?
For example, on one page I want all posts with a doc.user_id of X that are doc.published. On another page I might want all documents with doc.tags[] with the tag "sport".
You could build a view that iterates over the keys in the document, and emits a key of [propertyName, propertyValue] - that way you're building a single index with EVERYTHING prop/value in it. Would be massive, no idea how performance would be to build, and disk usage (probably bad).
Map function would look something like:
// note - totally untested, my CouchDB fu is rusty
function(doc) {
for(prop in doc) {
emit([prop, doc[prop]], null);
}
}
Works for the basic case of simple properties, and can be extended to be smart about arrays, and emit a prop/value pair for each item in the array. That would let you handle the tags.
To query on it, set [prop] as your query key on the view.
Basically, no.
The key difference between something like Couch and a SQL DB is that the only way to query in CouchDB is essentially through the views/indexes. Indexes in SQL are optional. They exist (mostly) to boost performance. For example, if you have a small DB, your app will run just fine on SQL with 0 indexes. (Might be some issue with unique constraints, but that's a detail.)
The overall point being is that part of the query processor in a SQL database includes other methods of data access beyond simply indexes, notably table scans, merge joins, etc.
Couch has no query processor. It has views (defined by JS) used to define B-Tree indexes.
And, that's it. That's the hammer of Couch. It's a good hammer. It's been lasting the data processing world for basically 40 years.
Indexes are somewhat expensive to create in Couch (based on data volume) which is why "temporary views" are frowned upon. And they have a cost in maintenance as well, so views need to be a conscious design element in your database. At the same time, they're a bit more powerful than normal SQL indexes as well.
You can readily add your own query processing on top of Couch, but that will be more work for you. You can create a few select views, on your most popular or selective criteria, and then filter the resulting documents by other criteria in your own code. Yes, you have to do it, so you have to question whether the effort involved is worth more than whatever benefits you feel Couch is offering your (HTTP API, replication, safe, always consistent datastore, etc.) over a SQL solution.
I ran into a similar issue like this, and built a quick workaround using CouchDB-Python (which is a great library). It's not a pretty solution (goes against the principles of CouchDB), but it works.
CouchDB-Python gives you the function "Query", which allows you to "execute an ad-hoc temporary view against the database". You can read about it here
What I have is that I store the javascript function as a string in python, and the concatenate it with variable names that I define in Python.
In some_function.py
variable = value
# Map function (in javascript)
map_fn = """function(doc) {
<javascript code>
var survey_match = """ + variable + """;
<javascript code>
"""
# Iterates through rows
for row in db.query(map_fn):
<python code>
It sure isn't pretty, and probably breaks a bunch of CouchDB philosophies, but it works.
D

CouchDB - Queries with params

I'm new to CouchDB and I know my mindset is probably still too much in the relational DB sphere, but here goes:
It appears that querying on Couch is all done via Views. I read that temporary views are very inefficient and should be avoided in production.
So my question really is how would one do effective querying with parameters (as the views do not accept them). For example if I were to use Couch to power a blog site would I have to create a new view for each post equivalent to 'select post from posts where id=1'.
I understand that I can use lucene along side the querying to perfom a full text search on the results, but this is only really useful for textual content not numbers.
Im happy creating a boat load of static views as they can be created very simply on the fly. My worry is that this is not how Couch was supposed to be used and I'm missing something. Feel free to enlighten me.
Cheers, Chris.
Views do accept url parameters, key being the one your are looking for. You can even limit how many rows you get and sort as well.
Your views can be indexed by arbitrary JSON keys. This means you can create a view that emits documents like so, [username docid] => doc. Then you can query this view with http://url/to/view?key=[username docid].
You could create a view that emits [username type date] => doc. Now you can get all documents of a certain between a certain date (using startKey and endKey url parameters).
Your example of the blog is one that CouchDB is particularly well suited for. In fact I believe it's an example in the upcoming CouchDB book from O'reilly.
That said, some kinds of queries are not easily handled by CouchDB alone. couchdb-lucene can help here. Don't assume that's it's only good for full text search. I've been using it to run general complex queries against the database to good effect.

Resources