SOLR - Block Join Parent Query parser variation possibility - search

I recently changed the schema of our Solr cluster and now indexing documents in the nested structure. I have indexed restaurant as parent with its dish as child docs. BJQ (Block Join Query) parent query parser is handy to filter out all the parents for the child satisfying a given condition -
{!parent which=doc_type:restaurant}doc_type:dish AND dish_name:burger
So the above query returns all the restaurant having burger as one of the dish. I have a use case where I want to filter out all the restaurant having multiple dishes all at once e.g restaurant selling burger & fries both.
{!parent which=doc_type:restaurant}doc_type:dish AND (dish_name:burger OR dish_name:fries)
above query will return restaurants selling either of burger or fries.
Its seems tedious to filter out restaurants selling both using BJQ. How can I write query to achieve this?

An approach I'v found uses {!bool}
q={!bool
must="{!parent which=doc_type:restaurant}(+doc_type:dish +dish_name:burger)"
must="{!parent which=doc_type:restaurant}(+doc_type:dish +dish_name:fries)"
}
If you want to use multivalue references, please note that this does not work correctly in case of block join.
Use single value references as workaround
q={!bool must=$ref1 must=$ref2}
&ref1={!parent which=doc_type:restaurant}+doc_type:dish +dish_name:burger
&ref2={!parent which=doc_type:restaurant}+doc_type:dish +dish_name:fries

Related

What is the use of the FROM in the Azure DocumentDB SQL-like query language?

I am using Azure DocumentDB. I only have one collection with heterogenous document types. I am using a type parameter to distinguish between different document types. I am making use of their SQL-like query language to get documents as follows:
SELECT * FROM Collection c WHERE c.ID = 123
I am getting my connection information, including the Endpoint URI, AuthKey, Database name and Collection name, from a configuration file. It seems like I can use any value for "Collection c" and it essentially just becomes an alias for the whole collection. So what is the point of the FROM section of my query?
I think you already got it :)
FROM allows you to set an alias to refer to the collection in other clauses. This may make more sense to you when you include multiple references (e.g. using a JOIN to form a cross-product with nested array elements).
For example:
SELECT food.description, tag.name
FROM food
JOIN tag IN food.tags
WHERE food.id = "09052"
In the query above, we are using referencing both the collection as well as nested array elements within a projection.
You can try this query out on the query demo website.

How to filter by external data not indexed in ElasticSearch

I can't find a way to do the following with ElasticSearch:
I have 2,000,000 items indexed in ElasticSearch
I have 30,000 players saved in MySQL
Every item has the name of a player as an attribute.
The online status of these players changes every 15 minutes, and can be true or false (obviously).
I would like to be able to show only items for online players.
I don't think I can index the online status with the item, since it changes so often.
I can't really get all the ids of the online players and use that as a filter since there are so many.
Would it help to index players in ElasticSearch as well? Is it possible to do some kind of JOIN with another index?
edit: After looking more into how doing joins with ES, I found out that it's actually possible with has_child if I index players in ES. Tire does not have a method for has_child, but is possible to do it with the existing DSL?
Seems a good fit for a parent child relation between players and items, even if you don't need full text search on the parent documents, because:
each item belongs to a player
they have independent update lifecycles: when a player changes, you don't want to reindex all his items
you only want to return the children, applying a filter to their parents.
You could index your players too, in the same index as the items but within a separate type. You need to declare in your mapping that the player type is parent of the item type:
{
"item":{
"_parent":{
"type" : "player"
}
}
}
After that you index the players, then your items specifying the parent player id for each of them.
You can then execute a full text search on the items, filtering them using the following has_parent filter.
{
"has_parent" : {
"parent_type" : "player",
"query" : {
"term" : {
"status" : true
}
}
}
}
This way you would only query and eventually return the items that belong to an active player.
In order to update players you can use the update API and maybe use scripting to avoid resending the whole document. Beware that the document is going to be deleted and reindexed anyway under the hood, that's how lucene works.
If you want to see more examples about relations between documents in elasticsearch, have a look at the following articles:
Fun With Elasticsearch's Children and Nested Documents
Managing Relations in ElasticSearch
Depending on the type of queries that you are going to need you might encounter limitations, but given what you've written this is what I would do. Just make sure your nodes have enough memory, since elasticsearch keeps in memory a join table containing all the ids involved when using parent-child.

CouchDB view collation, join on one key, search on other values

Looking at the example described in Couch DB Joins.
It discusses view collation and how you can have one document for your blog posts, and then each comment is a separate document in CouchDB. So for example, I could have "My Post" and 5 comments associated with "My Post" for a total of 6 documents. In their example, "myslug" is stored both in the post document, and each comment document, so that when I search CouchDB with the key "myslug" it returns all the documents.
Here's the problem/question. Let's say I want to search on the author in the comments and a post that also has a category of "news". How would this work exactly?
So for example:
function(doc) {
if (doc.type == "post") {
emit([doc._id, 0], doc);
} else if (doc.type == "comment") {
emit([doc.post, 1], doc);
}
}
That will load my blog post and comments based on this: ?startkey=["myslug"]
However, I want to do this, grab the comments by author bob, and the post that has the category news. For this example, bob has written three comments to the blog post with the category news. It seems as if CouchDB only allows me search on keys that exist in both documents, and not search on a key in one document, and a key in another that are "joined" together with the map function.
In other words, if post and comments are joined by a slug, how do I search on one field in one document and another field in another document that are joined by the id aka. slug?
In SQL it would be something like this:
SELECT * FROM comments JOIN doc.id ON doc.post WHERE author = bob AND category = news
I've been investigating couchdb for about a week so I'm hardly qualified to answer your question, but I think I've come to the conclusion it can't be done. View results need to be tied to one and only one document so the view can be updated. You are going to have to denormalize, at least if you don't want to do a grunt search. If anyone's come up with a clever way to do this I'd really like to know.
There are several ways that you can approximate a SQL join on CouchDB. I've just asked a similar question here: Why is CouchDB's reduce_limit enabled by default? (Is it better to approximate SQL JOINS in MapReduce views or List views?)
You can use MapReduce (not a good option)
You can use lists (This will iterate over a result set before emitting results, meaning you can 'combine' documents in a number of creative ways)
You can also apparently use 'collation', though I haven't figured this out yet (seems like I always get a count and can only use the feature with Reduce - if I'm on the right track)

filtering results in solr

I'm trying to build auto suggest functionality using Solr. The index contains different locations within a city and looks something like
id: unique id
name: the complete name
type: can be one of 'location_zone', 'location_subzone', 'location_city', 'outlet', 'landmark' ...
city: city id
now when the user types something, I want it to return suggestion only from the current city and of type location_*. something similar to WHERE city_id = 1 AND type="location_%" in SQL.
I guess one way to do it is by faceting but is that the right way? will it still search in all documents and then filter the results or will it apply the condition first as mysql would do it
PS: I'm new to solr and would appreciate if you can point out any mistakes in the approach
Solr does provide filtering, using the fq parameter. What you're looking for should be something along the lines of:
&fq=city_id:1&fq=type:location_*&q=...
This page illustrates very well how and when to use filter queries in Solr.

CouchDB views - Multiple join... Can it be done?

I have three document types MainCategory, Category, SubCategory... each have a parentid which relates to the id of their parent document.
So I want to set up a view so that I can get a list of SubCategories which sit under the MainCategory (preferably just using a map function)... I haven't found a way to arrange the view so this is possible.
I currently have set up a view which gets the following output -
{"total_rows":16,"offset":0,"rows":[
{"id":"11098","key":["22056",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22056",1,"11098"],"value":"Cat...."},
{"id":"33610","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"33989","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"11810","key":["22245",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22245",1,"11810"],"value":"Cat...."},
{"id":"33106","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"33321","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"11098","key":["22479",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22479",1,"11098"],"value":"Cat...."},
{"id":"11810","key":["22945",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22945",1,"11810"],"value":"Cat...."},
{"id":"33123","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33453","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33667","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33987","key":["22945",2,"null"],"value":"SubCat...."}
]}
Which QueryString parameters would I use to get say the rows which have a key that starts with ["22945".... When all I have (at query time) is the id "11810" (at query time I don't have knowledge of the id "22945").
If any of that makes sense.
Thanks
The way you store your categories seems to be suboptimal for the query you try to perform on it.
MongoDB.org has a page on various strategies to implement tree-structures (they should apply to Couch and other doc dbs as well) - you should consider Array of Ancestors, where you always store the full path to your node. This makes updating/moving categories more difficult, but querying is easy and fast.

Resources