Solr wrong sort text fields

Solr wrong sort text fields - search

I have "text_general" field in schema.xml
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/><filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
I have stored documents:
document1:
spell = "contro un indice generale dei prezzi salito del 2, 1%. Rincari ben piщ evidenti, tra i prodotti da bar"
testata = "Mattino di Padova (Il)"
document2:
spell="con i prodotti di qualitа vinco la crisi dei consumi Farinetti: con"
testata = "Italia Oggi"
document3
spell = "convenienza Il 2008 porta i primi aumenti nei pre zi L'Ipercoop cresce il listino"
testata = "Nuova Ferrara (La)"
"spell" and "testata" fields has a "text_general" type.
Searching working fine for me:
http://localhost:8080/solr/select?q={!type=edismax qf=spell v='co*'}
But with sorting exists some problem:
http://localhost:8080/solr/select?q={!type=edismax qf=spell v='co*'}&sort=testata desc
It returns me this result:
document1:
spell = "contro un indice generale dei prezzi salito del 2, 1%. Rincari ben piщ evidenti, tra i prodotti da bar"
testata = "Mattino di Padova (Il)"
document2:
spell="con i prodotti di qualitа vinco la crisi dei consumi Farinetti: con"
testata = "Italia Oggi"
document3
spell = "convenienza Il 2008 porta i primi aumenti nei pre zi L'Ipercoop cresce il listino"
testata = "Nuova Ferrara (La)"
I don`t understand why my sorting working not properly. It should returns me result like this:
document3
spell = "convenienza Il 2008 porta i primi aumenti nei pre zi L'Ipercoop cresce il listino"
testata = "Nuova Ferrara (La)"
document1:
spell = "contro un indice generale dei prezzi salito del 2, 1%. Rincari ben piщ evidenti, tra i prodotti da bar"
testata = "Mattino di Padova (Il)"
document2:
spell="con i prodotti di qualitа vinco la crisi dei consumi Farinetti: con"
testata = "Italia Oggi"

Sorting doesn't work good on multivalued and tokenized fields.
As testata has been defined with text_general field type, it will be tokensized and hence the sort would not work fine.
Sorting can be done on the "score" of the document, or on any
multiValued="false" indexed="true" field provided that field is either
non-tokenized (ie: has no Analyzer) or uses an Analyzer that only
produces a single Term (ie: uses the KeywordTokenizer)
Source: http://wiki.apache.org/solr/CommonQueryParameters#sort
Use string as the field type and copy the title field into the new field.
<field name="testata_sort" type="string" indexed="true" stored="false"/>
<copyField source="testata" dest="testata_sort" />

Related

Scrape data as <li> between two known keyword encapsulated as <b> tag

I'm using scrapy to scrape this kind of product. I want to scrape data as <li> between <b>Indication</b>and <b>Contre-indications</b> and then the next <b></b> for each there is not predictable keyword.
Here is the source code of the requested page.
<article class="col-md-10 col-md-push-1">
<p><b>Caractéristiques des croquettes pour chat Royal Canin Veterinary Diet - Urinary S/O LP 34 :</b>
</p><ul>
<li>struvite.</li>
<li>la vessie.</li>
<li>d'oxalate de calcium.
</li>
<li>maintien de la muqueuse vésicale </li></ul><p></p>
<p><b>Remarques :</b>
</p><ul>
<li> Urinary S/O Feline</li>
<li>chez le chat âgé, rénal avant la prescription de l'Urinary S/O Feline</li></ul><p></p>
<p><b>Indications :</b>
</p><ul>
<li>dissolution des calculs urinaires de struvite</li>
<li>gestion des récidives d’urolithiase à struvite et à oxalate de calcium dans un seul aliment</li></ul><p></p>
<p><b>Contre-indications :</b>
</p><ul>
<li>insuffisance rénale chronique, acidose métabolique</li>
<li>traitement avec des médicaments acidifiant l'urine</li>
<li>lactation, gestation, croissance</li></ul><p></p>
<p><b>Durée du traitement :</b> 5 à 12 semaines sont nécessaires pour obtenir la dissolution des calculs de struvites.<br>
P</p>
</article>
First approach : with regex, parse as free text. Didn't manage to obtain anything great with this regular formula (<b>[Ii]ndication[s]{0,1}.*?</b>)([\n\r]*.*)(<b>Contre-[Ii]ndication[s]{0,1}.*?</b>). It was working okay in the tester but the .re in Python wasn't finding any match. Okay let's move on.
Second Approach : I tried to extract using scrapy :
l.add_xpath('contre_indication','//*[#id="description-panel"]/div/article/b[starts-with(text(),"Contre-indications")]/following-sibling::ul/li/text()')
l.add_xpath('contre_indication','//*[#id="description-panel"]/div/article/p/b[starts-with(text(),"Contre-indications")]/following-sibling::ul/li/text()')
l.add_xpath('indication','//*[#id="description-panel"]/div/article/b[starts-with(text(),"Indication")]/following-sibling::ul/li/text()')
l.add_xpath('indication','//*[#id="description-panel"]/div/article/p/b[starts-with(text(),"Indication")]/following-sibling::ul/li/text()')
Sometimes the keyword xpath is a /b/ alone and sometimes a /p/b. This is the reason why there is two xpath for each.
Here at best I have the whole text between <li> but with not distinction of Indication/Contre-indications.
Expected output would be :
Indication : ["dissolution des calculs urinaires de struvite","gestion des récidives d’urolithiase à struvite et à oxalate de calcium dans un seul aliment"]
Contre-indication : ["insuffisance rénale chronique, acidose métabolique"..."lactation, gestation, croissance"]
I'm very keen to know the working approach of this kind of problem.
Kind regards

You can acomplish this with xpath selectors:
'//p[contains(b/text(),"Contre-indications")]/following-sibling::ul[1]/li/text()'
Explaining the xpath:
//p - select all paragraph nodes
[contains(b/text(),"Contre-indications")] - that contain some text in child node b's text
//following-sibling::ul[1] - select sibling of paragraph node that is first of unordered list kind.
//li/text() - select text of any children that are list nodes
If you run it in scrapy shell:
$ scrapy shell
> body = ...
> from parsel import Selector
> sel = Selector(text=body)
> sel.xpath('//p[contains(b/text(),"Indication")]/following-sibling::ul[1]/li/text()').extract()
['dissolution des calculs urinaires de struvite', 'gestion des récidives d’urolithiase à struvite et à oxalate de calcium dans un seul aliment']
> sel.xpath('//p[contains(b/text(),"Contre-indications")]/following-sibling::ul[1]/li/text()').extract()
['insuffisance rénale chronique, acidose métabolique', "traitement avec des médicaments acidifiant l'urine", 'lactation, gestation, croissance']

Will SOLR perform matching on the street name?

I have a query regarding matching a street name in SOLR.
The actual street name to match is POTTS ROAD EVANSFIELD VIC. I have stored the data in three fields:
street_name_clean : POTTSROADEVANSFIELDVIC
street_name_space : POTTS ROAD EVANSFIELD VIC
street_name : POTTS, ROAD, EVANSFIELD, VIC
The reason for storing the data as such is so that I can perform an exact search, fuzzy search, ngram search, proxmity matching etc.
I have see a case where the user inputs POTTROAD (missing S from the actual street name) and all my searches fail.
Is there a technique to match POTTROAD with the data above? Any help is appreciated.

Thanks to the suggestion from #MatsLindh, I implemented the ShingleFilter as follows:
<fieldType name="text_general_shingle" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="4" outputUnigrams="false"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="(\s+)" replacement="" replace="all" />
</analyzer>
</fieldType>

To decode following Solr query

In our hybris schema, we are using LowerCaseFilterFactory. Also, the name/description is of type "text". Hence, solr treats 'ConsTRUCTION' and 'construction' the same way.
However, if I search for (two differnt keywords combined) toysChildren, then I get many results. However, toyschildren has no results.
So, wish to decode following code to understand what's letting it do that.
solrQuery-toysChildren:
q=_query_:"\{\!multiMaxScore\+tie%3D0.0\}\(\(code_text\:toysChildren\^90.0\)\
+OR\+\(keywords_text_en_mv\:toysChildren\^100.0\)\
+OR\+\(name_text_en\:toysChildren\)\)\
+OR\+\(\(keywords_text_en_mv\:toysChildren\~\^10.0\)\)\
+OR\+\(\(keywords_text_en_mv\:toysChildren\*\^50.0\)\
+OR\+\(name_text_en\:toysChildren\*\^45.0\)\)\
+OR\+\(\(keywords_text_en_mv\:\"toysChildren\"\^100.0\)\
+OR\+\(name_text_en\:\"toysChildren\"\~0.0\^90.0\)\)"
&sort=score+desc&start=0&rows=100&facet.field=gender_string_mv
&facet.field=price_gbp_string
&facet.field=categoryPath_string_mv
&facet.field=allCategories_string_mv
&facet.field=excludeFromGiftFinder_boolean
&facet.field=productVisible_boolean
&facet.field=category_string_mv
&facet.field=brand_string_mv
&facet.field={!ex%3Dfk8}productType_string
&facet.field=age_string_mv
&facet=true
&fq=productVisible_boolean:true
&fq={!tag%3Dfk8}productType_string:(BUNDLE+OR+REGULAR+OR+ESD)
&fq=(catalogId:"coreProductCatalog"+AND+catalogVersion:"Online")
&facet.sort=count
&facet.mincount=1
&facet.limit=50
&spellcheck=true
&spellcheck.q=toysChildren&spellcheck.dictionary=en
&spellcheck.collate=true
Note: Above is the solrQuery formed up in DefaultFacetSearchStrategy of type "SolrQuery".
Query response is:
{responseHeader={status=0,QTime=18,params={facet.field=[gender_string_mv, price_gbp_string, categoryPath_string_mv, allCategories_string_mv, excludeFromGiftFinder_boolean, productVisible_boolean, category_string_mv, brand_string_mv, {!ex=fk8}productType_string, age_string_mv],spellcheck.dictionary=en,start=0,sort=score desc,fq=[productVisible_boolean:true, {!tag=fk8}productType_string:(BUNDLE OR REGULAR OR ESD), (catalogId:"coreProductCatalog" AND catalogVersion:"Online")],rows=100,version=2,q=_query_:"\{\!multiMaxScore\ tie=0.0\}\(\(code_text\:toysChildren\^90.0\)\ OR\ \(keywords_text_en_mv\:toysChildren\^100.0\)\ OR\ \(name_text_en\:toysChildren\)\)\ OR\ \(\(keywords_text_en_mv\:toysChildren\~\^10.0\)\)\ OR\ \(\(keywords_text_en_mv\:toysChildren\*\^50.0\)\ OR\ \(name_text_en\:toysChildren\*\^45.0\)\)\ OR\ \(\(keywords_text_en_mv\:\"toysChildren\"\^100.0\)\ OR\ \(name_text_en\:\"toysChildren\"\~0.0\^90.0\)\)",facet.limit=50,spellcheck.q=toysChildren,spellcheck=true,facet.mincount=1,facet=true,wt=javabin,facet.sort=count,spellcheck.collate=true}},response={numFound=3,start=0,docs=[SolrDocument{indexOperationId_long=79, id=coreProductCatalog/Online/100310, pk=8796107702273, catalogId=coreProductCatalog, catalogVersion=Online, allCategoryCodes_string=/SM06010425/SM060104/SM0601, price_gbp_string=£0 - £19.99, allCategories_string_mv=[SM06010425, SM0601, SM060104], category_string_mv=[SM06010425, SM0601, SM060104], rating_double=5.0, totalReviews_int=1, productType_string=REGULAR, excludeFromGiftFinder_boolean=true, pictureJson_string={"240":"https://image.smythstoys.com/picture/desktop/100310.jpg","220":"https://image.smythstoys.com/picture/tablet/100310.jpg","180":"https://image.smythstoys.com/picture/mobile/100310.jpg"}, gender_string_mv=[Female], autosuggest_en=[Sylvanian Families, Toys, Fashion & Dolls, Sylvanian Children's Bedroom Furniture], spellcheck_en=[Sylvanian Families, Toys, Fashion & Dolls, With 2 beech-style beds which can be stacked on top of each to make, Sylvanian Children's Bedroom Furniture], categoryName_text_en_mv=[Sylvanian Families, Toys, Fashion & Dolls], productVisible_boolean=true, url_en_string=/toys/fashion-and-dolls/sylvanian-families/sylvanian-children-s-bedroom-furniture/p/100310, pictureMap_string={min-width:1200=https://image.smythstoys.com/picture/desktop/100310.jpg, min-width:768=https://image.smythstoys.com/picture/tablet/100310.jpg, max-width:768=https://image.smythstoys.com/picture/mobile/100310.jpg}, priceValue_gbp_double=11.99, categoryNamePath_string_mv=[Toys, Toys > Fashion & Dolls, Toys > Fashion & Dolls > Sylvanian Families], categoryMetaTitle_string=SM06010425_Sylvanian Families: Awesome deals only at Smyths Toys UK, categoryMetaDescription_string=Sylvanian Families! Shop for an excellent range. Watch out for great offers at Smyths Toys UK, code_text=100310, description_text_en=With 2 beech-style beds which can be stacked on top of each to make, name_text_en=Sylvanian Children's Bedroom Furniture, name_sortable_en_sortabletext=Sylvanian Children's Bedroom Furniture, brand_string_mv=[Sylvanian], age_string_mv=[6 - 8 Years, 3 - 5 Years], categoryPath_string_mv=[/SM0601/SM060104, /SM0601, /SM0601/SM060104/SM06010425], customCategoryPath_string_mv=[/curl/toys/c/SM0601, /curl/toys/c/SM0601/curl/toys/fashion-and-dolls/c/SM060104, /curl/toys/c/SM0601/curl/toys/fashion-and-dolls/c/SM060104/curl/toys/fashion-and-dolls/sylvanian-families/c/SM06010425], ukBestsellerRating_en_int=999999, ukBestsellerRating_sortable_en_int=999999, pictureUrl_string=https://image.smythstoys.com/picture/desktop/100310.jpg, _version_=1577946952144781312}, SolrDocument{indexOperationId_long=79, id=coreProductCatalog/Online/100471, pk=8796108128257, catalogId=coreProductCatalog, catalogVersion=Online, allCategoryCodes_string=/SM06010326/SM060103/SM0601, price_gbp_string=£0 - £19.99, allCategories_string_mv=[SM06010326, SM0601, SM060103], category_string_mv=[SM06010326, SM0601, SM060103], rating_double=4.3, totalReviews_int=4, productType_string=REGULAR, excludeFromGiftFinder_boolean=false, pictureJson_string={"240":"https://image.smythstoys.com/picture/desktop/100471.jpg","220":"https://image.smythstoys.com/picture/tablet/100471.jpg","180":"https://image.smythstoys.com/picture/mobile/100471.jpg"}, gender_string_mv=[Male], autosuggest_en=[Vtech Infant, Toys, Pre-School & Electronic Learning, Toy Story Mr. Potato Head], spellcheck_en=[Vtech Infant, Toys, Pre-School & Electronic Learning, Includes lots of accessories and a special compartment for, Toy Story Mr. Potato Head], categoryName_text_en_mv=[Vtech Infant, Toys, Pre-School & Electronic Learning], productVisible_boolean=true, url_en_string=/toys/pre-school-and-electronic-learning/vtech-infant/toy-story-mr-potato-head/p/100471, pictureMap_string={min-width:1200=https://image.smythstoys.com/picture/desktop/100471.jpg, min-width:768=https://image.smythstoys.com/picture/tablet/100471.jpg, max-width:768=https://image.smythstoys.com/picture/mobile/100471.jpg}, priceValue_gbp_double=9.99, categoryNamePath_string_mv=[Toys, Toys > Pre-School & Electronic Learning, Toys > Pre-School & Electronic Learning > Vtech Infant], categoryMetaTitle_string=SM06010326_Vtech Infant: Awesome deals only at Smyths Toys UK, categoryMetaDescription_string=Vtech Infant! Shop for an excellent range. Watch out for great offers at Smyths Toys UK, code_text=100471, description_text_en=Includes lots of accessories and a special compartment for, name_text_en=Toy Story Mr. Potato Head, name_sortable_en_sortabletext=Toy Story Mr. Potato Head, brand_string_mv=[Toy Story], age_string_mv=[9 - 11 Years, 6 - 8 Years, 3 - 5 Years], categoryPath_string_mv=[/SM0601/SM060103/SM06010326, /SM0601/SM060103, /SM0601], customCategoryPath_string_mv=[/curl/toys/c/SM0601, /curl/toys/c/SM0601/curl/toys/pre-school-and-electronic-learning/c/SM060103, /curl/toys/c/SM0601/curl/toys/pre-school-and-electronic-learning/c/SM060103/curl/toys/pre-school-and-electronic-learning/vtech-infant/c/SM06010326], ukBestsellerRating_en_int=999999, ukBestsellerRating_sortable_en_int=999999, pictureUrl_string=https://image.smythstoys.com/picture/desktop/100471.jpg, _version_=1577946952157364224}, SolrDocument{indexOperationId_long=79, id=coreProductCatalog/Online/100838, pk=8796111962113, catalogId=coreProductCatalog, catalogVersion=Online, allCategoryCodes_string=/SM060307/SM0603, price_gbp_string=£0 - £19.99, allCategories_string_mv=[SM060307, SM0603], category_string_mv=[SM060307, SM0603], rating_double=0.0, productType_string=REGULAR, excludeFromGiftFinder_boolean=false, pictureJson_string={"240":"https://image.smythstoys.com/picture/desktop/100838.jpg","220":"https://image.smythstoys.com/picture/tablet/100838.jpg","180":"https://image.smythstoys.com/picture/mobile/100838.jpg"}, gender_string_mv=[Female], autosuggest_en=[Sports Equipment, Outdoor, 8oz Childrens Boxing Gloves], spellcheck_en=[Sports Equipment, Outdoor, 8oz childrens boxing gloves., 8oz Childrens Boxing Gloves], categoryName_text_en_mv=[Sports Equipment, Outdoor], productVisible_boolean=true, url_en_string=/outdoor/sports-equipment/8oz-childrens-boxing-gloves/p/100838, pictureMap_string={min-width:1200=https://image.smythstoys.com/picture/desktop/100838.jpg, min-width:768=https://image.smythstoys.com/picture/tablet/100838.jpg, max-width:768=https://image.smythstoys.com/picture/mobile/100838.jpg}, priceValue_gbp_double=4.99, categoryNamePath_string_mv=[Outdoor, Outdoor > Sports Equipment], categoryMetaTitle_string=SM060307_Sports Equipment: Awesome deals only at Smyths Toys UK, categoryMetaDescription_string=Sports Equipment! Shop for an excellent range. Watch out for great offers at Smyths Toys UK, code_text=100838, description_text_en=8oz childrens boxing gloves., name_text_en=8oz Childrens Boxing Gloves, name_sortable_en_sortabletext=8oz Childrens Boxing Gloves, age_string_mv=[9 - 11 Years, 6 - 8 Years], categoryPath_string_mv=[/SM0603/SM060307, /SM0603], customCategoryPath_string_mv=[/curl/outdoor/c/SM0603, /curl/outdoor/c/SM0603/curl/outdoor/sports-equipment/c/SM060307], ukBestsellerRating_en_int=999999, ukBestsellerRating_sortable_en_int=999999, pictureUrl_string=https://image.smythstoys.com/picture/desktop/100838.jpg, _version_=1577946952254881793}]},facet_counts={facet_queries={},facet_fields={gender_string_mv={Female=2,Male=1},price_gbp_string={£0 - £19.99=3},categoryPath_string_mv={/SM0601=2,/SM0601/SM060103=1,/SM0601/SM060103/SM06010326=1,/SM0601/SM060104=1,/SM0601/SM060104/SM06010425=1,/SM0603=1,/SM0603/SM060307=1},allCategories_string_mv={SM0601=2,SM060103=1,SM06010326=1,SM060104=1,SM06010425=1,SM0603=1,SM060307=1},excludeFromGiftFinder_boolean={false=2,true=1},productVisible_boolean={true=3},category_string_mv={SM0601=2,SM060103=1,SM06010326=1,SM060104=1,SM06010425=1,SM0603=1,SM060307=1},brand_string_mv={Sylvanian=1,Toy Story=1},productType_string={REGULAR=3},age_string_mv={6 - 8 Years=3,3 - 5 Years=2,9 - 11 Years=2}},facet_ranges={},facet_intervals={},facet_heatmaps={}},spellcheck={suggestions={},collations={}}}
Schema.xml - Some snippets: snippet 1-
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt" ignoreCase="true" />
<filter class="solr.ManagedStopFilterFactory" managed="en" />
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.ManagedSynonymFilterFactory" managed="en" />
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English" />
</analyzer>
</fieldType>
schema.xml Snippet 2:
<field name="text" type="textgen" indexed="true" stored="false" />
schema.xml Snippet 3:
<fieldType name="textgen" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
schema.xml Snippet 4:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>

Most likely you have different results because of solr.WordDelimiterFilterFactory and the setting splitOnCaseChange=1 which will break toysChildren into toys and Children. And in case of toyschildren, there is no case change, so you will only have token toyschildren and that's exactly what make the difference in your query results.
You have several choices depends on what is expected behaviour of your system. You could turn of this setting or completely remove solr.WordDelimiterFilterFactory from fieldtype.

Marklogic search:search issue with search terms which containing hyphen(-)

I'm doing the search with term compte-courant. I have thesarus file which contains thesaurus entry and it's value is "comptes-courants". When I do the search, search is returning the document which is containing the "comptes" or "compte". For those result documents <search:highlight> is not available. Please help to get the documents which contains only compte-courant or comptes-courants. Please find the attached search options and result content below.
Search Option
<options xmlns="http://marklogic.com/appservices/search">
<debug>false</debug>
<search-option>score-logtfidf</search-option>
<search-option>unfiltered</search-option>
<term>
<term-option>case-insensitive</term-option>
<term-option>diacritic-insensitive</term-option>
<term-option>punctuation-insensitive</term-option>
</term>
<quality-weight>5.0</quality-weight>
<return-constraints>false</return-constraints>
<return-facets>true</return-facets>
<return-qtext>true</return-qtext>
<return-query>false</return-query>
<return-results>true</return-results>
<return-metrics>true</return-metrics>
<return-similar>false</return-similar>
<transform-results apply="src-snippet" ns="/src-snippet"
at="/src-snippet.xqy">
<per-match-tokens>30</per-match-tokens>
<max-matches>4</max-matches>
<max-snippet-chars>200</max-snippet-chars>
<preferred-elements />
</transform-results>
<additional-query>
<cts:and-query xmlns:cts="http://marklogic.com/cts">
<cts:directory-query depth="infinity">
<cts:uri>/DOCS/</cts:uri>
</cts:directory-query>
<cts:word-query>
<cts:text xml:lang="en">compte-courant</cts:text>
<cts:text xml:lang="en">comptes-courants</cts:text>
</cts:word-query>
</cts:and-query>
</additional-query>
<sort-order type="xs:date" direction="descending">
<element ns="" name="sortabledate" />
<annotation>Sort by Date</annotation>
</sort-order>
<sort-order direction="descending">
<score />
</sort-order>
<grammar>
<starter strength="40" apply="grouping" delimiter=")">(</starter>
<starter strength="10" apply="prefix" element="cts:not-query">-</starter>
<joiner strength="30" apply="infix" element="cts:and-query"
tokenize="word">AND</joiner>
<joiner strength="20" apply="infix" element="cts:or-query"
tokenize="word">OR</joiner>
</grammar>
search:search("", $searchOptions, "1", "4") returns the following result.
Result
<search:response snippet-format="src-snippet" total="640"
start="1" page-length="10" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="" xmlns:search="http://marklogic.com/appservices/search">
<search:result index="1" uri="/DOCS/JK_KAJD-10194_MAR03.xml"
path="fn:doc("/DOCS/JK_KAJD-10194_MAR03.xml")" score="61440"
confidence="0.363169" fitness="0.534633">
<search:snippet>
<src-term />
<tool-tip>
<search:match
path="fn:doc("/DOCS/JK_KAJD-10194_MAR03.xml")/CASEDOC">Cour de justice de l'Union européenne, 4e chambre, 10 Novembre
2016 - n° C-156/15 JK_KAJD-10194_MAR03
</search:match>
</tool-tip>
<title>Cour de justice de l'Union européenne, 4e chambre, 10 Novembre
2016 - n° C-156/15
</title>
</search:snippet>
</search:result>
<search:result index="2" uri="/DOCS/JP_KID-630822_MAR03.xml"
path="fn:doc("/DOCS/JP_KID-630822_MAR03.xml")" score="61440"
confidence="0.363169" fitness="0.534633">
<search:snippet>
<src-term />
<tool-tip>
<search:match
path="fn:doc("/DOCS/JP_KID-630822_MAR03.xml")/CASEDOC">Conseil d'État, 3e sous-section, 16 Juillet 2015 - n° 388760
JP_KID-630822_MAR03
</search:match>
</tool-tip>
<title>Conseil d'État, 3e sous-section, 16 Juillet 2015 - n° 388760
</title>
</search:snippet>
</search:result>
<search:result index="3" uri="/DOCS/JP_KICA-0031257_MAR03.xml"
path="fn:doc("/DOCS/JP_KICA-0031257_MAR03.xml")" score="98304"
confidence="0.459376" fitness="0.676263">
<search:snippet>
<src-term>compte-courant</src-term>
<tool-tip>
<search:match
path="fn:doc("/DOCS/JP_KICA-0031257_MAR03.xml")/CASEDOC/*:body/*:content/*:judgments/*:judgment/*:judgmentbody/*:considerations[2]/p[13]/text">
ALORS QUE, D'AUTRE PART, en considérant que l'enregistrement des
redevances sur un
<search:highlight>compte-courant</search:highlight>
personnel valait mise à disposition des redevances de la location
gérance à M. X....
</search:match>
</tool-tip>
<title>Cour de cassation, 2e chambre civile, 9 Juillet 2015 – n°
14-21.758
</title>
</search:snippet>
</search:result>
<search:result index="4" uri="/DOCS/JP_KASS-0007470_MAR03.xml"
path="fn:doc("/DOCS/JP_KASS-0007470_MAR03.xml")" score="98304"
confidence="0.459376" fitness="0.676263">
<search:snippet>
<src-term>compte-courant</src-term>
<tool-tip>
<search:match>
ALORS QUE, D'AUTRE PART, en considérant que l'enregistrement des
redevances sur un
<search:highlight>compte-courant</search:highlight>
personnel valait mise à disposition des redevances de la location
gérance à M. X....
</search:match>
</tool-tip>
<title>Cour de cassation, 2e chambre civile, 9 Juillet 2015 – n°
14-21.755
</title>
</search:snippet>
</search:result>
<search:qtext />
<search:metrics>
<search:query-resolution-time>PT0S</search:query-resolution-time>
<search:facet-resolution-time>PT0S</search:facet-resolution-time>
<search:snippet-resolution-time>PT0.672S
</search:snippet-resolution-time>
<search:total-time>PT0.672S</search:total-time>
</search:metrics>

I think the issue you are running into is that the default search grammer treats the hyphen as "not with," so it's looking for "compte" or "comptes" without courants. It might be a different issue, but I know we had this on a recent project...try adding a different search grammar to your options. You can also wrap your search string in quotes so it is literal, but then you lose the "fuzzy" matching.
Here's a grammar that should work for this, if the root cause is what I think it is:
<grammar xmlns="http://marklogic.com/appservices/search">
<quotation>"</quotation>
<implicit>
<cts:and-query strength="20" xmlns:cts="http://marklogic.com/cts"/>
</implicit>
<starter strength="30" apply="grouping" delimiter=")">(</starter>
<joiner strength="10" apply="infix" element="cts:or-query"
tokenize="word">OR</joiner>
<joiner strength="20" apply="infix" element="cts:and-query"
tokenize="word">AND</joiner>
<joiner strength="30" apply="infix" element="cts:near-query"
tokenize="word">NEAR</joiner>
<joiner strength="30" apply="near2" consume="2"
element="cts:near-query">NEAR/</joiner>
<joiner strength="32" apply="boost" element="cts:boost-query"
tokenize="word">BOOST</joiner>
<joiner strength="35" apply="not-in" element="cts:not-in-query"
tokenize="word">NOT_IN</joiner>
<joiner strength="50" apply="constraint">:</joiner>
<joiner strength="50" apply="constraint" compare="LT"
tokenize="word">LT</joiner>
<joiner strength="50" apply="constraint" compare="LE"
tokenize="word">LE</joiner>
<joiner strength="50" apply="constraint" compare="GT"
tokenize="word">GT</joiner>
<joiner strength="50" apply="constraint" compare="GE"
tokenize="word">GE</joiner>
<joiner strength="50" apply="constraint" compare="NE"
tokenize="word">NE</joiner>
</grammar>

Use termfreq(field,term) function for phrase with space in SOLR 4.1

I am using termfreq(field,term) SOLR function. This works:
?fl=product_name,termfreq(product_name,"iphon")&q=iphone 4s //Found freq
But the problem is to have term like "iphone 4s" with space
?fl=product_name,termfreq(product_name,"iphon 4s")&q=iphone 4s //Return 0 freq
Return 0 freq although that term(phrase) exist in doc. So, the question is, can I use termfreq() function with full phrase like "iphone 4s", And how?
I am using SOLR 4.1. and analyzer for field is
<fieldType name="text_ws" class="solr.TextField">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Field is
<field name="product_name" type="text_ws" indexed="true" stored="true"/>

As you are using a WhitespaceTokenizerFactory the term iphone 4s would not exist as a term.
You could use KeywordTokenizerFactory for indexing, which doesn't tokenize the words and the phrase should be available.
Else you can check for shingle options which would group words for you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Solr wrong sort text fields - search

Related

Scrape data as <li> between two known keyword encapsulated as <b> tag

Will SOLR perform matching on the street name?

To decode following Solr query

Marklogic search:search issue with search terms which containing hyphen(-)

Use termfreq(field,term) function for phrase with space in SOLR 4.1

Categories

Resources