CouchBase range search - search

I'm considering using CouchBase for a very read heavy and write heavy application. I'll also need to support searching based on different attributes of the documents as well as range queries.
CouchBase has views to allow searching beyond key value searches but it seems like this is mainly to get documents within a certain range, eg. get all documents indexed between two specified keys, rather than "give me all documents that have the genre attribute to 'adventure'" or "give me all documents that have creation date between 1/1/1 and 2/1/1"
Is there a way to achieve what I want without an external index?

You can definitely do both of what you describe there. You'd do both with views in Couchbase Server 2.0.
For example, a common technique when needing to search a date range is to emit a JSON array from your map function in the view. This would give you something like:
[2012, 5, 11, 16, 27, 41]
Since when you query a view, a JSON array is a valid place for start key and end key, you can specify that range.
Similarly, extracting all of the attributes you'd emit each one of them from your map function with the doc _id. Then using one of the Couchbase SDKs, you can set the include docs option when querying and the doc will be automatically fetched.

Related

What library do you use for postgres+jsonb in Node?

I would like to do more complex queries on jsonb/documents that contain arrays of objects. Is there any library anyone would recommend for Node? I am using pg but I want to do more advanced queries like select the document where a document has an array with an object with a certain key/value. If there aren't any libraries that do this, does anyone know how I could do it with json functions/etc in psql? or point me to a book/resource where I could learn this advanced querying?
If you need to do really complicated things you're going to be writing SQL no matter what. But for basic queries that involve working with JSONB fields Massive (full disclosure, it's my project) has you covered, and executing handwritten prepared statements is as easy as anything else since scripts are loaded into the API.
Searching an embedded array falls into the 'really complicated' category, unfortunately, but if you know your element positions you could do this quite simply with Massive:
await db.mytable.find({
'somejson.arrayfield[0].key': 'value'
});
This would return all records from mytable where the somejson column has an arrayfield array, the first element in which contains a "key": "value" pair.
For searching, check out the Postgres docs. The specific question you have requires a lateral join on the jsonb_array_elements function like so:
SELECT somejson
FROM mytable
JOIN LATERAL jsonb_array_elements(mytable.somejson->'arrayfield') AS elements
ON TRUE
WHERE elements->>'key' = $1;
With Massive, you'd put this query in a script in your application's /db directory and run it as db.myScriptName('value'). You can use folders to group similar scripts too.

How to check for duplication before creating a new document in CouchDB/Cloudant?

We want to check if a document already exists in the database with the same fields and values of a new object we are trying to save to prevent duplicated item.
Note: This question is not about updating documents or about duplicated document IDs, we only check the data to prevent saving a new document with the same data of an existing one.
Preferably we'd like to accomplish this with Mango/Cloudant queries and not rely on views.
The idea so far is:
1) Scan the the data that we are trying to save and dynamically create a selector that matches that document's structure. (We can't have the selectors hardcoded because we have types of many documents)
2) Query de DB with for any documents matching that selector to if any document already exists that matches those criteria.
However I wonder about the performance of this approach since many of the selector fields will not be indexed.
I also much rather follow best practices than create something out of the blue, but haven't been able to find any known solutions for this specific scenario.
If you happen to know of any, please share.
Option 1 - Define a meaningful ID for your documents
The ID could be a logical coposition or a computed hash from the values that should be unique
If you want to check if a document ID already exists you can use the HEAD method
HEAD /db/docId
which returns 200-OK if the docId exits on the database.
If you would like to check if you have the same content in the new document and in the previous one, you may use the Validate Document Update Function which allows to compare both documents.
function(newDoc, oldDoc, userCtx, secObj) {
...
}
Option 2 - Use content hash computed outside CouchDB
Before create or update a document a hash should be computed using the values of the attributes that should be unique.
The hash is included in the document in a new attribute i.e. "key_hash"
Create a mango index using the "key_hash" attribute
When a new doc should be inserted, the hash should be computed and find for documents with the same hash value using a mango expression before the doc is inserted.
Option 3 - Compute hash in a View
Define a view which emit the computed hash for each document as key
Couchdb Javascript support does not include hashing functions, this could be difficult to include in a design document.
Use erlang to define the map function, where you can access to the erlang support for hashing.
Before creating a new document you should query the view using a the hash that you need to compute previously.
One solution would be to take Juanjo's and Alexis's comment one step further.
Select the keys you wish to keep unique
Put the values in a string and generate a hash
Set the document's _id to that hash
PUT the document on the database.
check return for failure
If another document already exists on the database with the same _id value, the PUT request will fail.

How to do "Not Equals" in couchdb?

Folks, I was wondering what is the best way to model document and/or map functions that allows me "Not Equals" queries.
For example, my documents are:
1. { name : 'George' }
2. { name : 'Carlin' }
I want to trigger a query that returns every documents where name not equals 'John'.
Note: I don't have all possible names before hand. So the parameters in query can be any random text like 'John' in my example.
In short: there is no easy solution.
You have four options:
sending a multi range query
filter the view response with a server-side list function
using a CouchDB plugin
use the mango query language
sending a multi range query
You can request the view with two ranges defined by startkey and endkey. You have to choose the range so, that the key John is not requested.
Unfortunately you have to find the commit request that somewhere exists and compile your CouchDB with it. Its not included in the official source.
filter the view response with a server-side list function
Its not recommended but you can use a list function and ignore the row with the key John in your response. Its like you will do it with a JavaScript array.
using a CouchDB plugin
Create an additional index with e.g. couchdb-lucene. The lucene server has such query capabilities.
use the "mango" query language
Its included in the CouchDB 2.0 developer preview. Not ready for production but will be definitely included in the stable release.

Search for documents by key using Domino Data Service

Domino Data Service is a good thing but is it possible to search for documents by key.
I didnt find anything in the api and the url parameters about it.
I tried the above and the requests usually fail on the server timeout after 30 seconds. Calls to /api/data/documents won't serve the purpose with parameters like sortcolumn or keysexactmatch, therefore calls to
/api/data/collections should be used for these.
Also, I don't think that arguments like sortcolumn would work on a document collection, because there isn't a column to be sorted in the first place, columns are in the views and not in documents, so view collection should be queried instead. That also mimics the behavior of getDocumentByKey method, which can't be called against document, but against the view. So instead:
http://HOSTNAME/DATABASE.nsf/api/data/documents?search=QUERY&searchmaxdocs=N
I would call
http://HOSTNAME/DATABASE.nsf/api/data/collections/name/viewname?search=QUERY&searchmaxdocs=N
and instead of
http://HOSTNAME/DATABASE.nsf/api/data/documents?sortcolumn=COLUMN&sortorder=ascending&keys=ROWVALUE&keysexactmatch=true
I would call:
http://HOSTNAME/DATABASE.nsf/api/data/collections/name/viewname?sortcolumn=COLUMN&sortorder=ascending&keys=ROWVALUE&keysexactmatch=true
where 'viewname' is the name of the view that is searched.
That is much faster, which comes in handy when working with larger databases.
You would do something like the following:
GET http://HOSTNAME/DATABASE.nsf/api/data/documents?search=QUERY&searchmaxdocs=N
N would be the total number of documents to return and QUERY would be your search phrase. The QUERY would be the same as doing a full text search.
For column lookups it should be something like this:
GET http://HOSTNAME/DATABASE.nsf/api/data/documents?sortcolumn=COLUMN&sortorder=ascending&keys=ROWVALUE&keysexactmatch=true
COLUMN would be the column name. ROWVALUE would be the key you are looking for.
There are further options for this. More details here.
http://infolib.lotus.com/resources/domino/8.5.3/doc/designer_up1/en_us/DominoDataService.html#migratingtowebsphereportalversion7.0

CouchDB views - Multiple join... Can it be done?

I have three document types MainCategory, Category, SubCategory... each have a parentid which relates to the id of their parent document.
So I want to set up a view so that I can get a list of SubCategories which sit under the MainCategory (preferably just using a map function)... I haven't found a way to arrange the view so this is possible.
I currently have set up a view which gets the following output -
{"total_rows":16,"offset":0,"rows":[
{"id":"11098","key":["22056",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22056",1,"11098"],"value":"Cat...."},
{"id":"33610","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"33989","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"11810","key":["22245",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22245",1,"11810"],"value":"Cat...."},
{"id":"33106","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"33321","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"11098","key":["22479",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22479",1,"11098"],"value":"Cat...."},
{"id":"11810","key":["22945",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22945",1,"11810"],"value":"Cat...."},
{"id":"33123","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33453","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33667","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33987","key":["22945",2,"null"],"value":"SubCat...."}
]}
Which QueryString parameters would I use to get say the rows which have a key that starts with ["22945".... When all I have (at query time) is the id "11810" (at query time I don't have knowledge of the id "22945").
If any of that makes sense.
Thanks
The way you store your categories seems to be suboptimal for the query you try to perform on it.
MongoDB.org has a page on various strategies to implement tree-structures (they should apply to Couch and other doc dbs as well) - you should consider Array of Ancestors, where you always store the full path to your node. This makes updating/moving categories more difficult, but querying is easy and fast.

Resources