CouchDB Search of Multiple Fields - couchdb

I am trying to use CouchDB 3.1 for the first time. I'm trying to do a dynamic query where multiple fields can be searched and is totally optional. Example of my data:
{
"_id": "464e9db4d9216e1621b354794a0181d4",
"_rev": "1-fade491c3e255bbbfa60f1d7462fa9a2",
"app_id": "0000001",
"username": "john#gmail.com",
"transaction": "registration",
"customer_name": "John Doe",
"status": "complete",
"request_datetime": "2020-01-31 12:05:00"
}
So what I'm trying to do is, the documents can be searched by "transaction", "transaction" and "app_id", or combination of the fields "app_id" / "username" / "transaction" / "username" / "status" / "request_datetime" based on the search input from the user. (Some of the field such as "app_id" might be null based on the "transaction")
I have tried to make View to search by "app_id" and "transaction" :
function (doc) {
if(doc.transaction && doc.app_id) {
emit([doc.transaction, doc.app_id], doc);
}
}
But this is not gonna work when the app_id itself is null due to key in CouchDB is the index.
So my question is whether this can be achieved using vanilla CouchDB without using GeoCouch or Lucene? Do I need to make different views based on different combination of search fields?
Any help is greatly appreciated. Thank you very much.

With /db/_find, you can define a selector that accepts combination operators and condition operators. This lets you create simple and really complex queries. Given your document structure, such a selector could look as follows.
"selector":{
"$and":[
{
"app_id":{
"$eq":"0000001"
}
},
{
"username":{
"$eq":"john#gmail.com"
}
},
{
"request_datetime": {
"$gte": "2020-01-31 12:00:00",
"$lt": "2020-01-31 13:00:00"
}
}
]
}
The $or operator, combined with $eq and $exists may be used for checking fields that can be null. The $regex operator offers you even much more power.
Here's a simple example using CURL (replace with the name of your database).
curl -H 'Content-Type: application/json' -X POST http://localhost:5984/<db>/_find -d '{"selector":{"username":{"$eq": "john#gmail.com"}}}'

Related

Cloudant Sorting on a nullable field

I want to sort on a field lets say name which is indexed in Cloudant DB. I am getting all the documents both which has this name field and which doesn't by using the index without sort . But when i try to sort with the name field I am not getting the documents which doesn't have this name field in the doc.
Is there any way to do this by using the query indexes. I want all the documents in sorted order which doesn't have the name field too.
For Example :
Below are some documents:
{
"_id": 1234,
"classId": "abc",
"name": "Happa"
}
{
"_id": 12345,
"classId": "abc",
"name": "Prasanth"
}
{
"_id": 123456,
"classId": "abc",
}
Below is the Query what i am trying to execute:
{
"selector": {
"classId": "abc",
"name" :{
"or" : [
{"$exists": true},{"$exists": false}
]
}
},
"sort": [{ "classId": "asc" }, { "name": "asc" }],
"use_index": "idx-classId_name"
},
I am expecting all the documents to be returned in a sorted order including the document which doesn't have that name field.
Your query makes no sense to me as it stands. You're requesting a listing of documents which either have, or don't have a specific field (meaning every document), and expecting to sort those on this field that may or may not exist. Such an order isn't defined out of the box.
I'd remove the name clause from the selector, sorting only on the classId field which appear in every document, and then do the secondary partial ordering on the client side, so you can decide how you intend to mix in the documents without the name field with those that have it.
Another solution is to use a view instead of a Cloudant Query index. I've not tested this, but hopefully the intent is clear:
function(doc) {
if (doc && doc.classId) {
var name = doc.name || "[notfound]";
emit(doc.classId+"-"+name, 1);
}
}
which will key the docs on "classId-name" and for docs with no name, a specified sentinel value.
Querying the view should return the documents lexicographically ordered on this compound key (which you can reverse with a query parameter if you wish).

How to make multiple IN ["V1", "V3","V5"] query

For a documents of that has the following structure
{
"countryCode": "US",
"status" : "Pending"
}
where the countryCode has limited list of options (ISO country codes)
and the status has a limited set of options too I need to select only the documents that
are for the given list of countries basically and given list of statuses
in SQL means it would be something like
countryCode IN ["US","AR", "UK"] AND status IN ["Pending", "Error", "Loading"]
is it at all possible in Cloudant / CouchDB?
With CouchDB's /db/_find, the following selector produces the desired result:
{
"selector":{
"$and":[
{
"countryCode":{
"$in":["US", "AR", "UK"]
}
},
{
"status":{
"$in":["Pending", "Error", "Loading"]
}
}
]
}
}
Condition operators such as $in are specific to a field, and are used to evaluate the value stored in that field.
CURL
curl -H 'Content-Type: application/json' -X POST http://localhost:5984/<db>/_find -d '{"selector":{"$and":[{"countryCode":{"$in":["US", "AR", "UK"]}},{"status":{"$in":["Pending", "Error", "Loading"]}}]}}'

Query data where userID in multiples ID

I try to make a query and i don't know the right way to do this.
The mongo collection structure contains multiples user ID (uid) and i want to make a query that get all datas ("Albums") where the User ID match one of the uid.
I do not know if the structure of the collection is good for that and I would like to know if I should do otherwise.
{
"_id": ObjectId("55814a9799677ba44e7826d1"),
"album": "album1",
"pictures": [
"1434536659272.jpg",
"1434552570177.jpg",
"1434552756857.jpg",
"1434552795100.jpg"
],
"uid": [
"12814a8546677ba44e745d85",
"e745d677ba4412814e745d7b",
"28114a85466e745d677d85qs"
],
"__v": 0
}
I just searched on internet and found this documentation http://docs.mongodb.org/manual/reference/operator/query/in/ but I'm not certain that this is the right way.
In short, I need to know: if I use the right method for the stucture of the collection and the operator "$in" is the right solution (knowing that it may have a lot of "User ID": between 2 and 2000 maximum).
You don't need $in unless you are matching for more than one possible value in a field, and that field does not have to be an array. $in is in fact shorthand for $or.
You just need a simple query here:
Model.find({ "uid": "12814a8546677ba44e745d85" },function(err,results) {
})
If you want "multiple" user id's then you can use $in:
Model.find(
{ "uid": { "$in": [
"12814a8546677ba44e745d85",
"e745d677ba4412814e745d7b",
] } },
function(err,results) {
}
)
Which is short for $or in this way:
Model.find(
{
"$or": [
{ "uid": "12814a8546677ba44e745d85" },
{ "uid": "e745d677ba4412814e745d7b" }
]
},
function(err,results) {
}
)
Just to answer your question, you can use the below query to get the desired result.
db.mycollection.find( {uid : {$in : ["28114a85466e745d677d85qs"] } } )
However, you need to revisit your data structure, looks like its a Many-to-Many problem and you might need to think about introducing a mid collection for that.

Cloudant: How to create an index for "Sort" function?

The problem I am facing is on creating the correct index to query through my Cloudant database. The JSON data structure I am using looks similar to below.
{
"customer" : "123",
"time" : "2014-11-20"
}
I want to sort the documents based on the time. The index query that I used is:
curl -X POST 'https://<user>:<pass>#<user>.cloudant.com/<DB-name>/_index' -d '
{
"index": {
"fields": [
"customer",
"time"
]
}
}'
And the Query that I am using is:
curl -X POST 'https://<user>:<pass>#<user>.cloudant.com/<DB-name>/_find' -d '
{
"selector": {
"customer" : "123"
},
"sort": [
"time"
]
}'
The error code I am getting is "no_usable_index". Can anyone provide some insight into this problem?
Also, what would be different if the time were in the format:
"2014-11-20 11:50:00"? Essentially, I am trying to sort based on date and time. Is this possible?
The error message is telling you that there is no index to perform the sorting, or at least it can't find one. To help it find one, sort on customer and then on time, like so:
curl -X POST 'https://<user>:<pass>#<user>.cloudant.com/<DB-name>/_find' -d '
{
"selector": {
"customer" : "123"
},
"sort": [
"customer",
"time"
]
}'
This query is functionally identical, but now Cloudant Query will find the index.
Regarding your question about the other time format, the time field would still be treated as a string field for the purposes of sorting. In your case, that means you'll get the expected result.

How do I index all the revisions of a couchdb doc using elasticsearch river plugin

I know how to set up the river plugin and search across it. The problem is if the same document is edited multiple times (multiple revisions), the data from oldest revision is retained and older data is lost. I intend to be able keep an index all revisions for my entire couchdb, so I don'thave to keep the history on my couchdb and retrieve history on a doc using elasticsearch and not have to go to the futon.
I know the issue will be to uniqely determine a key for a couchdb doc while indexing, but we can append the "revision" number to the key and every key will be unique.
I couldn't find for a way to do that in any documentation. Does anyone have an idea as to how to do it.
Any suggestions/thoughts are welcome.
EDIT 1 :
To be more explicit, at the moment elasticsearch saves couchdb docs like this:
"_index": "foo",
"_type": "foo",
"_id": "27fd33f3f51e16c0262e333f2002580a",
"_score": 1.0310782,
"_source": {
"barVal": "bar",
"_rev": "3-d10004227969c8073bc573c33e7e5cfd",
"_id": "27fd33f3f51e16c0262e333f2002580a",
here the _id from couchdb is same as _id for search index. I want the search index to be concat("_id","_rev") from couchdb.
EDIT 2: (after trying out #DaveS solution)
So I tried the following, but It didn't work - the search still indexes it based on the couchdb's _id
What I did:
curl -XDELETE 127.0.0.1:9200/_all
curl -XPUT 'localhost:9200/foo_test' -d '{
"mappings": {
"foo_test": {
"_id": {
"path": "newId",
"index": "not_analyzed",
"store": "yes"
}
}
}
}'
curl -XPUT 'localhost: 9200/_river/foo_test/_meta' -d '{
"type": "couchdb",
"couchdb": {
"host": "127.0.0.1",
"port": 5984,
"db": "foo_test",
"script": "ctx.doc.newId = ctx.doc._id + ctx.doc._rev",
"filter": null
},
"index": {
"index": "foo_test",
"type": "foo_test",
"bulk_size": "100",
"bulk_timeout": "10ms"
}
}'
And after this, when I search for a doc I added, I get:
_index: foo_test
_type: foo_test
_id: 53fa6fcf981a01b05387e680ac4a2efa
_score: 8.238497
_source: {
_rev: 4-8f8808f84eebd0984d269318ad21de93
content: {
foo: bar
foo3: bar3
foo2: bar2
}
_id: 53fa6fcf981a01b05387e680ac4a2efa
newId: 53fa6fcf981a01b05387e680ac4a2efa4-8f8808f84eebd0984d269318ad21de93
#DaveS - Hope this helps in explaining that elasticsearch is not not using the new path to define its "_id" field.
EDIT 3 - for #dadoonet. Hope this helps
This is how you get all older rev info for a couchdb. Then you can iterate through the ones available and get their data and index them:
Get a list of all revisions on a doc id:
curl http://:5984/testdb/cde07b966fa7f32433d33b8d16000ecd?revs_info=true
{"_id":"cde07b966fa7f32433d33b8d16000ecd",
"_rev":"2-16e89e657d637c67749c8dd9375e662f",
"foo":"bar",
"foo2":"bar2",
"_revs_info":[
{"rev":"2-16e89e657d637c67749c8dd9375e662f",
"status":"available"},
{"rev":"1-4c6114c65e295552ab1019e2b046b10e",
"status":"available"}]}
And then you can retrieve each version by (if the status is available):
curl http://<foo>:5984/testdb/cde07b966fa7f32433d33b8d16000ecd?rev=1-4c6114c65e295552ab1019e2b046b10e
{"_id":"cde07b966fa7f32433d33b8d16000ecd",
"_rev":"1-4c6114c65e295552ab1019e2b046b10e",
"foo":"bar"}
curl http://<foo>:5984/testdb/cde07b966fa7f32433d33b8d16000ecd?rev=2-16e89e657d637c67749c8dd9375e662f
{"_id":"cde07b966fa7f32433d33b8d16000ecd",
"_rev":"2-16e89e657d637c67749c8dd9375e662f",
"foo":"bar",
"foo2":"bar2"}
I don't think you can.
Just because as far as I remember, CouchDb does not hold the older versions of a document.
After a compact, old versions are removed.
That said, even if it was doable in CouchDB, you can not store different versions of a document in Elasticsearch.
To do that, you have to define an ID for the new document: for example:
DOCID_REVNUM
That way, new revisions won't update the existing document.
The CouchDB river does not do that by now.
I suggest that you manage that in CouchDB (aka create new docs for each new version of a document) and let the standard CouchDB river index it as another document.
Hope this helps
You might consider adjusting your mapping to pull the _id field from a generated field, e.g. from the docs:
{
"couchdoc" : {
"_id" : {
"path" : "doc_rev_id"
}
}
}
Then "just" modify the river to concatenate the strings and add the result into the document in my_concat_field. One way to do that might be to use the script filter plugin that the couchdb river provides. E.g. something like this:
{
"type" : "couchdb",
"couchdb" : {
"script" : "ctx.doc.doc_rev_id = ctx.doc._id + '_' + ctx.doc._rev"
}
}
You'd take the above snippit and PUT it to the river's endpoint, possibly with the rest of the definition, e.g. via curl -XPUT 'localhost:9200/_river/my_db/_meta' -d '<snippit from above>. Take care to escape the quotes as necessary.

Resources