change precedence in a query, 'AND' or 'OR', without () - search

I'm doing a search query in the Lucene searchengine.
since I'm doing this with a railo setup I'm having trouble with the parenthesis bug
so I can't do a query like(which I need):
"exact term here"^10 OR "less exact term"^5 OR ("loose term1" AND "loose term2" AND "loose term3")
but only like:
"exact term here"^10 OR "less exact term"^5 OR "loose term1" AND "loose term2" AND "loose term3"
But this would behave like:
("exact term here"^10 OR "less exact term"^5 OR "loose term1") AND "loose term2" AND "loose term3"
Is there a way to get around this without any parenthesis

Related

SOLR 6.6 eDisMax query not respecting mm parameter

I am using Solr 6.6.2 and up until now we were using the DisMax query along with the mm parameter, and it works just as expected. A small example
defType=dismax&q=samsung+iphone&qf=name+brand&mm=1 would return a result set as expected, containing both iPhones, and Samsung products. However when do the exact same thing, just replace the defType to defType=edismax (keeping the mm=1) there is no result returned. I have read the documentation of the eDisMax query parser at the Apache SOLR reference and it clearly says the eDisMax is an extension so I expect the mm to behave the same in both DisMax and eDisMax, also if you scroll down on the same page, the documentation also gives an example of the mm parameter that should work as I expect.
Is this a bug, or am I missing something very obvious? Love some help here
EDIT Adding the solr params that are sent along with the request
eDisMax
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":0,
"params":{
"mm":"1",
"q":"samsung iphone",
"defType":"edismax",
"indent":"on",
"fl":"name, category_names, score",
"fq":"channel:outbound",
"wt":"json",
"_":"1515855895772"}},
"response":{"numFound":0,"start":0,"maxScore":0,"docs":[]
}}
DisMax
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":22,
"params":{
"mm":"1",
"q":"samsung iphone",
"defType":"dismax",
"indent":"on",
"fl":"name, category_names, score",
"fq":"channel:outbound",
"wt":"json",
"_":"1515855895772"}},
"response":{"numFound":2147,"start":0,"maxScore":12.172616,"docs":[
{
"name":"Apple iPhone 5s",
"score":12.172616},...]
}}
Not sure if anyone else faced this issue or not, but I feel it was not well documented.
Our Solr q.op defaults to AND and so a search term like "foo bar car" was searched as foo AND bar AND cat. Now with mm=2 and defType=dismax the query was losely parsed to (foo AND bar AND cat)~2 so far so good..
When I change the defType=edismax the same query "foo bar cat" and mm=2 was parsed as (foo AND bar AND cat) NOTE: the missing ~2.
Now if I go ahead and add q.op=OR interesting things start to happen. The parsed query now looks like (foo bar cat)~2 this is exactly what the dismax does. So you could understand (foo bar cat)~2 find me at least 2 of these terms. If I remove the mm parameter, the parsed query looks like (foo bar cat) and since the q.op=OR it returns any document that contains any of the 3 terms.
Either ways, it doesn't really matter and the same behavior can be achieved, its just confusing that the behavior between disMax and eDisMax is somewhat inconsistent.
Hope this helps someone else.

poor search performance for certain wildcard queries

I am having performance issues when using wildcard searching for certain letter combinations, and I am not sure what else I need to to to possibly improve it. All of my documents are following an envelope pattern that look something like the following.
<pdbe:person-envelope>
<person xmlns="http://schemas.abbvienet.com/people-db/model">
<account>
<domain/>
<username/>
</account>
<upi/>
<title/>
<firstName>
<preferred/>
<given/>
</firstName>
<middleName/>
<lastName>
<preferred/>
<given/>
</lastName>
</person>
<pdbe:raw/>
</pdbe:person-envelope>
I have a field defined called name, which includes the firstName and lastName paths:
{
"field-name": "name",
"field-path": [
{
"path": "/pdbe:person-envelope/pdbm:person/pdbm:firstName",
"weight": 1
},
{
"path": "/pdbe:person-envelope/pdbm:person/pdbm:lastName",
"weight": 1
}
],
"trailing-wildcard-searches": true,
"trailing-wildcard-word-positions": true,
"three-character-searches": true
}
When I do some queries using search:search, some come back fast, whereas others come back slow. This is with the filtered queries.
search:search("name:ha*",
<options xmlns="http://marklogic.com/appservices/search">
<constraint name="name">
<word>
<field name="name"/>
</word>
</constraint>
<return-plan>true</return-plan>
</options>
)
I can see from the query plan that it is going to filter over all 136547 fragments in the db. But this query works fast.
<search:query-resolution-time>PT0.013205S</search:query-resolution-time>
<search:snippet-resolution-time>PT0.008933S</search:snippet-resolution-time>
<search:total-time>PT0.036542S</search:total-time>
However a search for name:tj* takes a long time, and also filters over all of the 136547 fragments.
<search:query-resolution-time>PT6.168373S</search:query-resolution-time>
<search:snippet-resolution-time>PT0.004935S</search:snippet-resolution-time>
<search:total-time>PT12.327275S</search:total-time>
I have the same indexes on both. Are there any other indexes I should be enabling when I am specifically just doing a search via the field constraint? I have these other indexes enabled on the database itself, in general.
"collection-lexicon": true,
"triple-index": true,
"word-searches": true,
"word-positions": true
I tried doing an unfiltered query, but that did not help as I got a bunch of matches on the whole document, and not the the fields I wanted. I even tried to set the root-fragment to just my person element, but that did not seem to help things.
"fragment-root": [
{
"namespace-uri": "http://schemas.abbvienet.com/people-db/model",
"localname": "person"
}
]
Thanks for any ideas.
Fragment roots are helpful if you want to use a searchable expression for that person element, and mostly if it occurs multiple times in one document. It won't make your current search constrain on that element.
In your case you enabled a number of relevant options, but the wildcard option only works for 4 characters of more. If you want to search on wildcards with less characters, you need to enable the three, two and one character search options.
The search phrases mentioned above both contained two characters with a wildcard. Since you only enabled the three character option, it had to rely on filtering. The fact some run fast, some slow is probably because of caching. If you repeat the same query, MarkLogic will return the result from cache.
For performance testing you would either have to restart MarkLogic regularly to flush caches, or search on (semi) random strings to avoid MarkLogic being able to cache. Or maybe both..
HTH!

Group and term query combination in Lucene

I am new to Lucene and I wanted I wanted to filter my search result based on 3 criterion:
value of field document_type should be Product
value of field brand_id should be 4
value of field family_id should be all of the values from (121, 232, 343)
So what I basically want is to have combinations like following in the search result:
document_type:Product AND brand_id:4 AND family_id:121
document_type:Product AND brand_id:4 AND family_id:232
document_type:Product AND brand_id:4 AND family_id:343
I thought document_type:Product AND brand_id:4 AND family_id:(121 232 343) should do the trick but while parsing this query standard analyzer makes Product to even when while indexing field document_type for value Product was set to Field.Index.NOT_ANALYZED and Field.Store.YES.
I was wondering if it is possible to create a boolean query by combining 3 possible queries for the given 3 cases.
I am quite new with Lucene, could someone help me with it?
Thanks.
Query.combine(Query[]) worked like a charm for the given situation.
The documentation for the given method is available here.
The query turned out to be like following once the combine was applied:
(+document_type:Product +brand_id:4 +family_id:121) (+document_type:Product +brand_id:4 +family_id:232) (+document_type:Product +brand_id:4 +family_id:343)
Thanks.

Need to use Solr dismax handler but i have no q parameter???

Hi
i am trying to make a solr Query using dismax handler but i have no q parameters because i have to match directly on fields..
hl.fragsize=200&mm=1&facet=on&facet.mincount=1&qf=text+&wt=json&hl=true&rows=50&fl=*+score&start=0&q=*:*&fq=jSFunT:("Fresher"+OR+"Developer+/+Programmer+/+Coder")&fq=jNMinEx:[2+TO+*]&fq=jNMaxEx:[2+TO+5]&fq=jNMinSal:[-1+TO+*]&fq=jNMaxSal:[-1+TO+-1]&bq=jSFunT:("Developer+/+Programmer+/+Coder")^1&bq=jSkill:(HTML)^2&bq=jCID:(41449)^8&bq=jJT:(Developer+)^8&bq=jLoc:(Mumbai-Thane+)^4&bq=jINDT:("IT(Software,+Dotcom,+Infra.Mgmt.%26+UI+Design)")^1
OR you can better understand it from below..
&mm=1
&qf=text
&wt=json
&hl=true
&rows=50
&fl=*+score
&start=0
&q=*:*
&fq=jSFunT:("Fresher"+OR+"Developer+/+Programmer+/+Coder")
&fq=jNMinEx:[2+TO+*]
&fq=jNMaxEx:[2+TO+5]
&fq=jNMinSal:[-1+TO+*]
&fq=jNMaxSal:[-1+TO+-1]
&bq=jSFunT:("Developer+/+Programmer+/+Coder")^1
&bq=jSkill:(HTML)^2
&bq=jCID:(41449)^8
&bq=jJT:(Java Developer)^8
&bq=jLoc:(Mumbai-Thane)^4
&bq=jINDT:("IT(Software,+Dotcom,+Infra.Mgmt.%26+UI+Design)")^1
Here all the "bq" will not work because the qt=dismax is not supplied if i use that then the whole query will fail
can i any one help me out i will be very thankful for this kindness
Have a look at the q.alt parameter, which lets you specify a fall back query:
q.alt=*:*
If you replace your q parameter with that one, dismax should play just fine.

Search on multiple keywords with tweetsharp

I am trying to use Tweetsharp to do a search on twitter for specific keywords but I want to do a search on multiple keywords. The following code works but only for one keyword. Anyone would know how to do an "or" search with tweetsharp?
ITwitterLeafNode searchQuery = FluentTwitter.CreateRequest()
.Search().Query()
.ContainingHashTag("heart")
.Since(sinceID)
.InLanguage("EN")
.Take(tweetCount)
.AsJson();
var results = searchQuery.Request().AsSearchResult();
Twitter's standard search operators seem to work fine with TweetSharp, so you could use the Containing() method instead:
var qry = FluentTwitter.CreateRequest()
.Search().Query()
.Containing("#heart OR #soul");
Note that the "OR" needs to be in capitals.
Whoops. Looks like we forgot to implement OR. Most people use "Containing" as a rote query expression like Matt has demonstrated. If you want us to add more extensions for boolean operators, let us know by filing a bug.

Resources