How to index CouchDB with Elastic Search River: In plain english

How to index CouchDB with Elastic Search River: In plain english - couchdb

I really don't know what's going on with my configuration, but I'm just not able to query anything after indexing (don't even know if I'm doing the indexing part correctly). Could someone please tell me what each of the following means and should be?
I have a CouchDB database called bestdb. Inside this database I have document types like product and customer.
Now I installed elastic search version 0.18.7 and the corresponding couchdb river. I started elastic search and couchdb. I set the network.host of elasticsearch to be an ip address: 10.0.0.129 . I followed the instructions in the tutorial :
curl -XPUT '10.0.0.129:9200/_river/{A}/_meta' -d '{
"type" : "couchdb",
"couchdb" : {
"host" : "localhost",
"port" : 5984,
"db" : "bestdb",
"filter": null
},
"index" : {
"index" : "{B}",
"type" : "{C}",
"bulk_size" : "100",
"bulk_timeout" : "10ms"
}
}'
{A}: What's this? My understanding is that this is just an internal elastic search index right? It's not being used for querying or searching right? So this could be any name right?
{B}: What's this index? How is this different from the one above? What should the value of this be in my scenario?
{C}: Is this related to the Document Type in couchdb, like product or customer ?
The online tutorial just sets everything to be the same value. How would my curl statement look like if I wanted to query all product documents or customer documents?
Thank you to whoever that clears things up a bit for me.
Regards,
Mark Huang

kimchy's documentation often leaves a little bit to the imagination. :-)
A is the river name. A river is just an ES document, stored in an index named _river, a type named whatever you want, and a doc id _meta.
B & C is the local index/_type that your bestdb couchdb _changes stream will get indexed into. These can be overridden by _index and _type fields in your couchdb documents. If none of the above is supplied, they'll default to your couchdb instance name bestdb/bestdb.

Related

Querying mongoDB using pymongo (completely new to mongo/pymongo)

If this question seems too trivial then please let me know in the comments, I will do further research on how to solve it.
I have a collection called products where I store details of a particular product from different retailers. The schema of a document looks like this -
{
"_id": "uuid of a product",
"created_at": "timestamp",
"offers": [{
"retailer_id": 123,
"product_url": "url - of -a - product.com",
"price": "1"
},
{
"retailer_id": 456,
"product_url": "url - of -a - product.com",
"price": "1"
}
]
}
_id of a product is system generated. Consider a product like 'iPhone X'. This will be a single document with URLs and prices from multiple retailers like Amazon, eBay, etc.
Now if a new URL comes into the system, I need to make a query if this URL already exists in our database. The obvious way to do this is to iterate every offer of every product document and see if the product_url field matches with the input URL. That would require loading up all the documents into the memory and iterate through the offers of every product one by one. Now my question arises -
Is there a simpler method to achieve this? Using pymongo?
Or my database schema needs to be changed since this basic check_if_product_url_exists() is too complex?

MongoDB provides searching within arrays using dot notation.
So your query would be:
db.collection.find({'offers.product_url': 'url - of -a - product.com'})
The same syntax works in MongoDB shell or pymongo.

Unable to full text search in Solr

I have some data in solr. I want to search which name is Chinmay Sahu See below I have 3 results in output. But I got 3 instead of 1. Because Content name searched partially.
I want to full search those name having Chinmay Sahu only that contents will come.
Output:
"docs": [
{
"id": "741fde46a654879949473b2cdc577913",
"content_id": "1277",
"name": "Chinmay Sahu",
"_version_": 1596995745829879800
},
{
"id": "4e98d680efaab3afe051f3ddc00dc5f2",
"content_id": "1825",
"name": "Chinmay Panda",
"_version_": 1596995745829879800
}
{
"id": "741fde46a654879949473b2cdc577913",
"content_id": "1259",
"name": "Sasmita Sahu",
"_version_": 1596995745829879800
}
]
Query:
name:Chinmay Sahu
Expected :
"docs": [
{
"id": "741fde46a654879949473b2cdc577913",
"content_id": "1277",
"name": "Chinmay Sahu",
"_version_": 1596995745829879800
},
]
Please help

Try doing this
name:"Chinmay Sahu"
You need to do a phrase query to match the exact name.
I am guessing in your case the name field is using Standard tokenizer which will split tokens if whitespace is there. So while indexing in all the 3 docs there will be a token called "chinmay".
While you search using
name:Chinmay Sahu
Solr will search it like this since if there is no fieldName specified before a token solr automatically searches it in default_field.(however default field is removed from solr 7.3, So it depends on what version of solr are you using.
)
Name:chinmay AND default_field:sahu
So since all the three docs are having chinmay as a token in the index,the query will match all 3 docs.
Now i dont know what your default field is? can you post your solr schema? That way we can explain why you are seeing those 3 docs.

Since root545 already explained that field:foo bar will search for foo in field and bar in the default search field, I'll suggest that it seems like you don't want to concern yourself with the exact Lucene syntax for searching. The edismax query parser is well suited for separating the typed search string from what fields are being searched and whether you want all tokens to match.
The query in that case would be just Chinmay Sahu, while you'd set q.op=AND (all terms must match), defType=edismax (use the edismax query parser) and qf=name (search the name field):
q=Chinmay Sahu&q.op=AND&defType=edismax&qf=name
You can also tune the different phrase parameters to make sure that names with the tokens in the exact same sequence will be boosted higher than those that have them in the opposite sequence (i.e. Sahu Chinmay).
If this is a programmatic search where no user is actually typing in the suggestion, using a phrase search as suggested is the way to go (name:"Chinmay Sahu").

I would suggest using query like
name:(Chinmay Sahu)
And make sure default operator is AND either in settings or query string like q.op=AND
With that approach you can use user input much easier since you don't need to parse it too much.

Kibana and groovy scripting

I was looking for a way to calculate a ratio on Kibana. After many researches i found this way :
Using the "JSON Input" feature in a visualisation.
I have all my informations in an index, with 2 types of documents (boots and reboots).
I am looking for the script which count the number of documents with the type boots, same for the reboots type then divide the second by the first.
It sounds really easy, but i do not find any way to get it after my researches, and i am not used to groovy enough yet to do it by myself.
I found many ways to manipulate documents values (doc['mydocname'].values etc), but nothing about the type.
Thanks in advance.
EDIT : I tried this
{
"aggs" : {
"boots_count" : { "value_count" : { "_type" : "boots" } }
}
}
Which is supposed to count the number of fields (here the field _type) in the index. But when i put it into "JSON Input" in a visualisation, that results in an error :
Error: Request to Elasticsearch failed: {"error":"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[BbXJ0O6tRxa_OcyBfYCGJQ][informationbe][0]: SearchParseException[[informationbe][0]: from[-1],size[0]: Parse Failure [Failed to parse source [{\"size\":0,\"aggs\":{\"2\":{\"terms\":{\"field\":\"#sitePoste\",\"size\":5,\"order\":{\"1\":\"desc\"}},\"aggs\":{\"1\":{\"avg\":{\"script\":\"0\",\"lang\":\"expression\",\"ratio\":{\"boots_count\":{\"value_count\":{\"_type\":\"boots\"}}}}}}}}
I am wrong. But where ?
EDIT2 : In other hand, i am trying scripted fields, with something like this using lucene expression :
doc['_type:boots'].count / doc['_type:reboots'].count
but it doesnt work more, i am pretty confident about the "doc['_type:boots']" part, i guess the problem is on the "XXX.count" part.
After many attempts, i understand better and better how it works. Default scripted fields scope is on the document, not on the whole index, so i cant do a count action of whole values of the index from documents in it.
I am looking for a workaround, i'll post it it if find something interesting.

I finally solved my problem :
I added a scripted field, if the type of the document is boots, the scripted field = 1, else 0. Then i created a search with only boots and reboots documents (filter _type:boots _type:reboots) and calculated the average of the scripted field in a metric.
Everything works well !

How do I convert this MongoDB Date value?

This has me stumped. I am fairly new to noSql, and node.js development. So running into moments of what the heck are pretty common. Yet I cannot come to grips with this one on my own.
We are inserting documents into a mongo user collection and everything is working as it should. What I do not get and would like to have some insight on...is the creation of my users, the _id value is also a date stamp. I can sort on this field and user names corresponds to sign up log entries. Yet for the life of me I cannot determine a way to convert this to a normal time-stamp that is human readable.
520193b4571be99a06000031 is typical date code.
Here is a collection snip.
{
"_id" : ObjectId("520193b4571be99a06000031"),
"email" : "this_user#gmail.com",
"google" : {
"email" : "XXXXXXXXXXXXXXXXXXXXXX",
"expires" : ISODate("2012-10-11T18:30:13.611Z"),
"accessToken" : "A_Reallly_REALLY_LONG one!!!!####$$$$$$%%%%%%%"
},
"login" : "google:XXXXXXXXXXXXXXXXXXXXXX"
}

Per the docs:
ObjectId("520193b4571be99a06000031").getTimestamp()

look at here
http://docs.mongodb.org/manual/reference/object-id/
or http://api.mongodb.org/java/2.0/org/bson/types/ObjectId.html
and create an objectID and get date from there

Cannot add design documents (beginning with "_") in CouchDB

When I try to add design documents (beginning with "_") I get an error "Only reserved document ids may start with underscore." How can I add a design document?

According to the Definitive Guide, a design document like this one:
{
"_id" : "_design/example",
"views" : {
"foo" : {
"map" : "function(doc){ emit(doc._id, doc._rev)}"
}
}
}
can be added to the database named basic with a curl command like this:
curl -X PUT http://127.0.0.1:5984/basic/_design/example --data-binary #mydesign.json
Personally, I find it much easier to use CouchApp to add and manage design documents. This section of the Definitive Guide describes how to install and use it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to index CouchDB with Elastic Search River: In plain english - couchdb

Related

Querying mongoDB using pymongo (completely new to mongo/pymongo)

Unable to full text search in Solr

Kibana and groovy scripting

How do I convert this MongoDB Date value?

Cannot add design documents (beginning with "_") in CouchDB

Categories

Resources