How to index documents with elastic.js client? - node.js

So far I haven't found any samples of HOW the elastic.js client api (https://github.com/fullscale/elastic.js) can be used for indexing documents. There are some clues here & there but nothing concrete yet.
http://docs.fullscale.co/elasticjs/ejs.Document.html
Document ( index, type, id ): Object used to create, replace, update, and delete documents
Document > doIndex(fnCallBack): Stores a document in the given index and type. If no id is set, one is created during indexing.
Document > source (doc): Sets the source document.
Can anyone provide a sample snippet of code to show how an document object can be instantiated and used to index data?
Thanks!
Update # 1 (Sun Apr 21st, 2013 on 12:58pm CDT)
https://gist.github.com/pulkitsinghal/5430444

Your gist is correct.
You create ejs.Document objects specifying the index, type, and optionally the id of the document you want indexed. If you don't specify an id, elasticsearch will generate one for you.
You set the source to the json object you want indexed then call the doIndex method specifying a callback if needed. The node example does not index docs, but the angular and jquery examples show a basic example and can easily be used with the node client.
https://github.com/fullscale/elastic.js/blob/master/examples/angular/js/controllers.js#L30
Also have a peek at the tests:
https://github.com/fullscale/elastic.js/blob/master/tests/index_test.js#L265

elastic.js nowadays only implements the Query DSL, so it can't be used for this scenario anymore. See this commit.

Related

CouchDB check if a document exists in a validation function

I would like to see if a document exists in the database that has the name field "name" set to "a name" before allowing a new document to be added to the database.
I this possible in CouchDB using update handlers (inside design documents)?
Seems you are looking for a unique constraint in CouchDB. The only unique constraint supported by CouchDB is based on the document ID.
You should include your "name" attribute value into the document ID if you would like to have the document unicity based on it.
Validate document update functions defined in desing documents can only use the data of the document being created/updated/deleted, it can no use data from other documents in the database.
Yo can find a similar question here.
This is not widely known, but _update endpoint allowed to return a doc with _id prop different from requested. It means, in your case, you need to have an unique document say _id:"doc-name", which will serve as a constraint.
Then you call smth like POST _design/whatever/_update/saveDependentDoc/doc-name, providing new doc with different _id as a request body.
Your _update function will effectively receive two docs as an input (or null and newDoc if constraint doc is missing). The function then decides what should it do: return received doc to persist it, or return nothing.
The solution isn’t a full answer to your question, however it might be helpful in some cases.
This trick only works for updating existing docs if you know revision, for sure.

How to check for duplication before creating a new document in CouchDB/Cloudant?

We want to check if a document already exists in the database with the same fields and values of a new object we are trying to save to prevent duplicated item.
Note: This question is not about updating documents or about duplicated document IDs, we only check the data to prevent saving a new document with the same data of an existing one.
Preferably we'd like to accomplish this with Mango/Cloudant queries and not rely on views.
The idea so far is:
1) Scan the the data that we are trying to save and dynamically create a selector that matches that document's structure. (We can't have the selectors hardcoded because we have types of many documents)
2) Query de DB with for any documents matching that selector to if any document already exists that matches those criteria.
However I wonder about the performance of this approach since many of the selector fields will not be indexed.
I also much rather follow best practices than create something out of the blue, but haven't been able to find any known solutions for this specific scenario.
If you happen to know of any, please share.
Option 1 - Define a meaningful ID for your documents
The ID could be a logical coposition or a computed hash from the values that should be unique
If you want to check if a document ID already exists you can use the HEAD method
HEAD /db/docId
which returns 200-OK if the docId exits on the database.
If you would like to check if you have the same content in the new document and in the previous one, you may use the Validate Document Update Function which allows to compare both documents.
function(newDoc, oldDoc, userCtx, secObj) {
...
}
Option 2 - Use content hash computed outside CouchDB
Before create or update a document a hash should be computed using the values of the attributes that should be unique.
The hash is included in the document in a new attribute i.e. "key_hash"
Create a mango index using the "key_hash" attribute
When a new doc should be inserted, the hash should be computed and find for documents with the same hash value using a mango expression before the doc is inserted.
Option 3 - Compute hash in a View
Define a view which emit the computed hash for each document as key
Couchdb Javascript support does not include hashing functions, this could be difficult to include in a design document.
Use erlang to define the map function, where you can access to the erlang support for hashing.
Before creating a new document you should query the view using a the hash that you need to compute previously.
One solution would be to take Juanjo's and Alexis's comment one step further.
Select the keys you wish to keep unique
Put the values in a string and generate a hash
Set the document's _id to that hash
PUT the document on the database.
check return for failure
If another document already exists on the database with the same _id value, the PUT request will fail.

How to do "Not Equals" in couchdb?

Folks, I was wondering what is the best way to model document and/or map functions that allows me "Not Equals" queries.
For example, my documents are:
1. { name : 'George' }
2. { name : 'Carlin' }
I want to trigger a query that returns every documents where name not equals 'John'.
Note: I don't have all possible names before hand. So the parameters in query can be any random text like 'John' in my example.
In short: there is no easy solution.
You have four options:
sending a multi range query
filter the view response with a server-side list function
using a CouchDB plugin
use the mango query language
sending a multi range query
You can request the view with two ranges defined by startkey and endkey. You have to choose the range so, that the key John is not requested.
Unfortunately you have to find the commit request that somewhere exists and compile your CouchDB with it. Its not included in the official source.
filter the view response with a server-side list function
Its not recommended but you can use a list function and ignore the row with the key John in your response. Its like you will do it with a JavaScript array.
using a CouchDB plugin
Create an additional index with e.g. couchdb-lucene. The lucene server has such query capabilities.
use the "mango" query language
Its included in the CouchDB 2.0 developer preview. Not ready for production but will be definitely included in the stable release.

REST API design for a large mongo document

I'm having a large document (stored in a Mongo db) and i should expose this document as a REST API. By large I mean more then 200 fields with nested documents and nested list of documents.
My question is simple, what is the best approach to design a REST api for such document.
I see 2 options :
1/ Design a single endpoint for the document
[GET] /api/documents ==> will return an array with 1 doc ...
[GET] /api/documents/:id ==> will return the document by it's id
2/ design multiple endpoints for the document
[GET] /api/documents ==> returning all the first level details of the document
[GET] /api/documents/id/field1 ==> returning all inner doc (array of object) from field1 of the document
[GET] /api/documents/id/field1/nid ==> returning object nid from field1 of the document
The application which will consume the REST api will read and modify the data.
This question may seems tedious but for me this is fundamental to the good design of the application which will consume these REST services.
Thanks in advance for your help.
I would suggest going with your first approach, with the addition of a query parameter "depth". The parameter would indicate how many levels deep to fill in the document that's being returned. It can default to 1 or ALL, depending on your clients' needs.
GET /api/documents/342?depth=8
GET /api/documents/78?depth=ALL
That gives them the flexibility to pull as much information as they need without slamming them with the whole document subtree when they just need one node.
It would also be a standard practice for /documents to always return a collection of documents, not a single document. You could use a query parameter to find the root documents:
GET /api/documents?root=true
and then use the id of the root document to pull down however much you need from /api/documents/{id}.

CouchDB and Couchbase Document Keys

In reference material for CouchDB and Couchbase it's common guidance to store the type of a document as a parameter within the actual document.
I've got a database, where I have different documents that record certain behaviour by URL. So naturally, I use the URL as the id of the document.
The problem I find is that by using just the key as the document id, I now get clashes between documents of different types. So I have started using the type as the first part of the key like this:
{ doc._id: "rss_entry|http://www.spiegel.de/1234", [...] }
{ doc._id: "page_text|http://www.spiegel.de/1234", [...] }
Now I start to wonder why I've never seen this approach to model type in any of the documentation.
Prefixes are commonly used. In addition to support for scenarios such as yours, prefixing allows one to perform logical range queries against views. There is use of this technique in the modeling examples, but perhaps the concept is not described in as much detail as you are expecting. In the section http://docs.couchbase.com/couchbase-devguide-2.5/#modeling-documents, the documents are keyed as beer_NNNN and brewery_NNNN. Also, the section http://docs.couchbase.com/couchbase-devguide-2.5/#using-reference-documents-for-lookups goes a bit deeper into this technique. There is a counter document named user::count and then each user is keyed as user::NNNN. Additionally, there are documents in the example that are keyed as fb::NNNN for a Facebook ID, email::XXX#YYYY.com for a user's email address, etc.

Resources